Reference: Assignment Requirements

This assignment implements a Lambda-based, event-driven system that

monitors changes in an S3 bucket,

records object size history in DynamoDB,

and generates visualizations of the tracked data.

Quick Takeaway

This assignment is a two-stack AWS CDK application that defines and deploys cloud infrastructure using AWS CDK and CloudFormation.

It provisions an S3 bucket and a Lambda-based size-tracking service that records object size history.
By using infrastructure as code, it enables automated deployment, reproducible environments, and simplified infrastructure lifecycle management.

The end-to-end flow is:
S3 object change -> EventBridge -> size_tracking Lambda -> DynamoDB -> plotting Lambda -> plot bucket

The practical workflow is:

verify AWS account and region
initialize or open the CDK project
create the Python virtual environment
run local synth checks
bootstrap if needed
deploy both stacks
run the demo flow and validate outputs
destroy or recover stacks when needed

Python CDK code → CloudFormation template → AWS resources

1. What is the purpose of Assignment?

Assignment 3 keeps the Assignment 2 application behavior, but replaces manual AWS Console setup with AWS CDK.

That means:

the Lambda business logic still follows the same core workflow
infrastructure is now defined in Python CDK code
deployment, update, and cleanup should go through CDK and CloudFormation

The system still needs the same functional behavior:

S3 object changes happen in the tracked bucket
a size-tracking Lambda records bucket state into DynamoDB
a plotting Lambda generates a chart from recent history
a driver Lambda performs the demo sequence and triggers plotting

The driver sequence remains:

create assignment1.txt with Empty Assignment 1
update assignment1.txt to Empty Assignment 2222222222
delete assignment1.txt
create assignment2.txt with 33
call the plotting API

2. Architecture and Project Structure

2.1 Stack Design

This project currently uses two stacks:

StorageStack: creates the stateful resources first
LambdaStack: consumes them

2.2 Resource Ownership

`StorageStack`

Creates:

one data s3 bucket
one plot s3 bucket
one DynamoDB table
one GSI named GSI_SizeByBucket

`LambdaStack`

Creates:

size-tracking Lambda
plotting Lambda
driver Lambda
API Gateway REST API
EventBridge rule
CloudWatch log groups
IAM permissions

2.3 Lambda Responsibilities

`size_tracking_lambda.py`

lists objects in the tracked bucket
computes object_cnt and total_size
writes a history record to DynamoDB

`plotting_lambda.py`

reads recent history from DynamoDB
reads the historical maximum from the GSI
generates a matplotlib plot
uploads the plot to the plot bucket

`driver_lambda.py`

performs the demo file sequence
triggers the plotting API

Expected visible history: 0 -> 18 -> 27 -> 0 -> 2

2.4 Project Layout

assignment3_project/
├── app.py
├── cdk.json
├── README.md
├── requirements.txt
├── assignment3_app/
│   ├── __init__.py
│   ├── storage_stack.py
│   └── lambda_stack.py
├── lambdas/
│   ├── size_tracking_lambda.py
│   ├── plotting_lambda.py
│   └── driver_lambda.py
├── scripts/
│   └── destroy_app.sh
└── tests/

3. Resource-Level Expectations

3.1 Data Model

The DynamoDB table stores history by bucket over time.

Current schema:

partition key: bucket_name
sort key: time

Current GSI:

name: GSI_SizeByBucket
partition key: bucket_name
sort key: total_size

This supports:

recent-history queries from the main table
historical-maximum queries from the GSI without using scan

3.2 Why Two Buckets Are Reasonable

The assignment is sometimes described as if it only needs one bucket, but this implementation uses:

one data bucket for tracked object activity
one plot bucket for generated image output

That is still a reasonable interpretation of the requirement because the buckets have different roles.

3.3 Naming Rule

The important naming requirement is to avoid hardcoded physical names for deployable AWS resources.

In this repo, the practical rule is:

do not hardcode S3 bucket names
do not hardcode Lambda function names
do not hardcode CloudWatch LogGroup names
let CDK generate physical names automatically

Using a fixed internal identifier such as the DynamoDB GSI name is acceptable because it is part of the table schema and can be passed into Lambda code through environment variables.

4. CDK Project Initialization

cdk init creates the starting scaffold for a new CDK project.

For a Python app, it typically generates:

app.py
cdk.json
requirements.txt
the initial stack file

Use:

cdk init app --language python

Important:

the command is cdk
app is the template type
--language python tells CDK to generate a Python CDK project

Run it inside a new empty directory:

mkdir my_cdk_project
cd my_cdk_project
cdk init app --language python

If the directory is not empty, CDK may refuse to initialize it.

Common init errors

`cdk: command not found`

Cause:

CDK CLI is not installed or not on PATH

Fix:

npm install -g aws-cdk
cdk --version

Init fails in a non-empty directory

Cause:

the target directory already contains files

Fix:

use a new empty folder
or intentionally clean the folder first

5. Local Python Setup

From the project root:

python3 -m venv .venv
source .venv/bin/activate
./.venv/bin/python -m pip install -r requirements.txt

Then run a local synth-style check:

./.venv/bin/python app.py

This is useful because it verifies:

imports work
stack wiring works
CDK can synthesize templates

6. AWS Account and Credential Setup

Before any deploy or destroy, verify both the active AWS account and the active region.

The main AWS CLI config files are:

~/.aws/credentials
~/.aws/config

Check the current shell:

echo $AWS_PROFILE
echo $AWS_DEFAULT_REGION

Best verification commands:

aws configure list
aws sts get-caller-identity
env | rg '^AWS_'

For this project, the normal shell setup is:

export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
aws sts get-caller-identity

This matters because:

the repo commands assume alan-admin
the matplotlib layer ARN targets us-east-1

If you want to clear the explicit settings:

unset AWS_PROFILE
unset AWS_DEFAULT_REGION
unset CDK_DEFAULT_REGION

Then verify again with:

aws configure list
aws sts get-caller-identity

7. How CDK and CloudFormation Work Together

CDK does not replace CloudFormation.

The relationship is:

CDK = authoring layer
CloudFormation = execution layer

The actual flow is:
Python CDK code -> CloudFormation template -> CloudFormation stack -> AWS resources

That leads to three practical rules:

cdk synth or python app.py checks whether valid templates can be produced
cdk deploy asks CloudFormation to create or update resources
cdk destroy asks CloudFormation to delete resources

So when deployment fails:

CDK output is only the surface symptom
CloudFormation events are the real source of truth

8. CloudFormation Basics You Need for This Repo

A CloudFormation stack is a managed unit of infrastructure created from a template.

In this project:

StorageStack is one CloudFormation stack
LambdaStack is another CloudFormation stack

Important stack lifecycle states:

CREATE_IN_PROGRESS
CREATE_COMPLETE
UPDATE_IN_PROGRESS
UPDATE_COMPLETE
UPDATE_ROLLBACK_IN_PROGRESS
UPDATE_ROLLBACK_FAILED
DELETE_IN_PROGRESS
DELETE_FAILED

Why this matters:

CDK is not directly creating resources by itself
CloudFormation controls the actual stack lifecycle
if rollback fails, the stack can get stuck and block normal CDK operations

9. Bootstrap, Synth, Deploy, and Destroy

9.1 `cdk bootstrap`

Bootstrap prepares one AWS account and one region for CDK deployment.

Run it when:

you use a new AWS account
you use a new region
bootstrap resources were deleted

Command used in this repo:

cdk bootstrap --app "./.venv/bin/python app.py"

9.2 `cdk synth`

This checks whether the CDK app can generate valid templates:

cdk synth is not just a syntax check — it executes the CDK app and synthesizes it into CloudFormation templates.

cdk synth --app "./.venv/bin/python app.py"

cdk synth:

runs the CDK app,
converts the stacks into CloudFormation templates,
writes the synthesized output to cdk.out,
and may also print the template to the terminal.

Just running cdk synth will not create anything in AWS.

9.3 `cdk deploy`

Deploy command:

cdk deploy --all --app "./.venv/bin/python app.py"

CloudFormation will then either:

keep resources unchanged
update resources in place
replace resources if the change requires replacement

9.4 `cdk destroy`

Normal cleanup command in this repo:

AWS_PROFILE=alan-admin bash ./scripts/destroy_app.sh us-east-1

This is the preferred normal cleanup path when the stacks are healthy.

10. Recommended Working Session

10.1 Typical local workflow

source .venv/bin/activate
export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
aws sts get-caller-identity
./.venv/bin/python app.py

10.2 Typical deploy workflow

source .venv/bin/activate
export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
cdk bootstrap --app "./.venv/bin/python app.py"
cdk deploy --all --app "./.venv/bin/python app.py"

10.3 Typical validation checklist

After deployment, verify:

StorageStack and LambdaStack exist in CloudFormation
both S3 buckets exist
DynamoDB contains history records
the plotting API returns HTTP 200
the plot object exists in the plot bucket

10.4 Demo checklist

For the assignment demo, you should be able to show:

the repo at the required commit
successful CDK deployment
both stacks in CloudFormation
deployed AWS resources
manual invocation of the driver Lambda
DynamoDB history records
the generated plot object

11. Stateful Resources and Replacement Risk

Some infrastructure changes can be applied in place.

Other changes force CloudFormation to replace the resource:

create a new resource
shift dependencies
delete the old resource

This is resource replacement.

Stateful resources are risky because they hold data.

In this project, the important stateful resources are:

S3 buckets
DynamoDB tables

The current project is especially cleanup-friendly and destructive because it uses:

RemovalPolicy.DESTROY
auto_delete_objects=True on buckets

That means:

deleting a bucket can delete all objects in it
deleting the table removes stored history
replacing a stateful resource can also remove old data

Practical rule:

before changing stateful infrastructure, ask whether the change is in-place or replacement
for this assignment, replacement is acceptable for demo data
for anything important, replacement is not acceptable without a data migration plan

12. Custom Resources and Stack Failure Recovery

This project can involve CDK-managed custom resources such as:

Custom::S3BucketNotifications

These matter because a stack can fail in ways that are not obvious from top-level CDK output.

One common failure pattern is:

an S3 bucket is replaced or deleted
the notification custom resource still runs during cleanup
it tries to update notifications on a bucket that no longer exists
AWS returns NoSuchBucket
the stack gets stuck in DELETE_FAILED or rollback failure

If cdk destroy is no longer enough:

stop repeating the same CDK command
check CloudFormation events
find the exact failing logical resource
recover through CloudFormation directly

Example recovery command:

AWS_PROFILE=alan-admin aws cloudformation delete-stack \
  --stack-name StorageStack \
  --region us-east-1 \
  --retain-resources DataBucketNotifications11EB1C2E \
  --deletion-mode FORCE_DELETE_STACK

Then wait:

AWS_PROFILE=alan-admin aws cloudformation wait stack-delete-complete \
  --stack-name StorageStack \
  --region us-east-1

This is a recovery path only, not the normal workflow.

13. One-Page Summary

Assignment 3 is Assignment 2 behavior packaged as CDK-managed infrastructure.
CDK is the authoring layer; CloudFormation is the execution layer.
The app uses two stacks:
- StorageStack for stateful storage resources
- LambdaStack for compute, API, and integration resources
The normal lifecycle is:
- verify AWS account and region
- initialize environment
- run synth
- bootstrap if needed
- deploy
- validate
- destroy when finished
Avoid hardcoded physical names for deployable AWS resources.
Treat stateful resource replacement as potentially destructive.
If a stack is badly stuck, debug at the CloudFormation level instead of treating CDK output as the root cause.

Notes

Explorer

Assignment: AWS CDK Deployment