Reference: Assignment Requirements
This assignment implements a Lambda-based, event-driven system that
- monitors changes in an S3 bucket,
- records object size history in DynamoDB,
- and generates visualizations of the tracked data.
Quick Takeaway
This assignment is a two-stack AWS CDK application that defines and deploys cloud infrastructure using AWS CDK and CloudFormation.
- It provisions an S3 bucket and a Lambda-based size-tracking service that records object size history.
- By using infrastructure as code, it enables automated deployment, reproducible environments, and simplified infrastructure lifecycle management.
The end-to-end flow is:
S3 object change -> EventBridge -> size_tracking Lambda -> DynamoDB -> plotting Lambda -> plot bucket
The practical workflow is:
- verify AWS account and region
- initialize or open the CDK project
- create the Python virtual environment
- run local synth checks
- bootstrap if needed
- deploy both stacks
- run the demo flow and validate outputs
- destroy or recover stacks when needed
Python CDK code → CloudFormation template → AWS resources
1. What is the purpose of Assignment?
Assignment 3 keeps the Assignment 2 application behavior, but replaces manual AWS Console setup with AWS CDK.
That means:
- the Lambda business logic still follows the same core workflow
- infrastructure is now defined in Python CDK code
- deployment, update, and cleanup should go through CDK and CloudFormation
The system still needs the same functional behavior:
- S3 object changes happen in the tracked bucket
- a size-tracking Lambda records bucket state into DynamoDB
- a plotting Lambda generates a chart from recent history
- a driver Lambda performs the demo sequence and triggers plotting
The driver sequence remains:
- create
assignment1.txtwithEmpty Assignment 1 - update
assignment1.txttoEmpty Assignment 2222222222 - delete
assignment1.txt - create
assignment2.txtwith33 - call the plotting API
2. Architecture and Project Structure
2.1 Stack Design
This project currently uses two stacks:
StorageStack: creates the stateful resources firstLambdaStack: consumes them
2.2 Resource Ownership
StorageStack
Creates:
- one data s3 bucket
- one plot s3 bucket
- one DynamoDB table
- one GSI named
GSI_SizeByBucket
LambdaStack
Creates:
- size-tracking Lambda
- plotting Lambda
- driver Lambda
- API Gateway REST API
- EventBridge rule
- CloudWatch log groups
- IAM permissions
2.3 Lambda Responsibilities
size_tracking_lambda.py
- lists objects in the tracked bucket
- computes
object_cntandtotal_size - writes a history record to DynamoDB
plotting_lambda.py
- reads recent history from DynamoDB
- reads the historical maximum from the GSI
- generates a matplotlib plot
- uploads the plot to the plot bucket
driver_lambda.py
- performs the demo file sequence
- triggers the plotting API
Expected visible history: 0 -> 18 -> 27 -> 0 -> 2
2.4 Project Layout
assignment3_project/
├── app.py
├── cdk.json
├── README.md
├── requirements.txt
├── assignment3_app/
│ ├── __init__.py
│ ├── storage_stack.py
│ └── lambda_stack.py
├── lambdas/
│ ├── size_tracking_lambda.py
│ ├── plotting_lambda.py
│ └── driver_lambda.py
├── scripts/
│ └── destroy_app.sh
└── tests/3. Resource-Level Expectations
3.1 Data Model
The DynamoDB table stores history by bucket over time.
Current schema:
- partition key:
bucket_name - sort key:
time
Current GSI:
- name:
GSI_SizeByBucket - partition key:
bucket_name - sort key:
total_size
This supports:
- recent-history queries from the main table
- historical-maximum queries from the GSI without using
scan
3.2 Why Two Buckets Are Reasonable
The assignment is sometimes described as if it only needs one bucket, but this implementation uses:
- one data bucket for tracked object activity
- one plot bucket for generated image output
That is still a reasonable interpretation of the requirement because the buckets have different roles.
3.3 Naming Rule
The important naming requirement is to avoid hardcoded physical names for deployable AWS resources.
In this repo, the practical rule is:
- do not hardcode S3 bucket names
- do not hardcode Lambda function names
- do not hardcode CloudWatch LogGroup names
- let CDK generate physical names automatically
Using a fixed internal identifier such as the DynamoDB GSI name is acceptable because it is part of the table schema and can be passed into Lambda code through environment variables.
4. CDK Project Initialization
cdk init creates the starting scaffold for a new CDK project.
For a Python app, it typically generates:
app.pycdk.jsonrequirements.txt- the initial stack file
Use:
cdk init app --language pythonImportant:
- the command is
cdk appis the template type--language pythontells CDK to generate a Python CDK project
Run it inside a new empty directory:
mkdir my_cdk_project
cd my_cdk_project
cdk init app --language pythonIf the directory is not empty, CDK may refuse to initialize it.
Common init errors
cdk: command not found
Cause:
- CDK CLI is not installed or not on
PATH
Fix:
npm install -g aws-cdk
cdk --versionInit fails in a non-empty directory
Cause:
- the target directory already contains files
Fix:
- use a new empty folder
- or intentionally clean the folder first
5. Local Python Setup
From the project root:
python3 -m venv .venv
source .venv/bin/activate
./.venv/bin/python -m pip install -r requirements.txtThen run a local synth-style check:
./.venv/bin/python app.pyThis is useful because it verifies:
- imports work
- stack wiring works
- CDK can synthesize templates
6. AWS Account and Credential Setup
Before any deploy or destroy, verify both the active AWS account and the active region.
The main AWS CLI config files are:
~/.aws/credentials
~/.aws/configCheck the current shell:
echo $AWS_PROFILE
echo $AWS_DEFAULT_REGIONBest verification commands:
aws configure list
aws sts get-caller-identity
env | rg '^AWS_'For this project, the normal shell setup is:
export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
aws sts get-caller-identityThis matters because:
- the repo commands assume
alan-admin - the matplotlib layer ARN targets
us-east-1
If you want to clear the explicit settings:
unset AWS_PROFILE
unset AWS_DEFAULT_REGION
unset CDK_DEFAULT_REGIONThen verify again with:
aws configure list
aws sts get-caller-identity7. How CDK and CloudFormation Work Together
CDK does not replace CloudFormation.
The relationship is:
CDK = authoring layerCloudFormation = execution layer
The actual flow is:
Python CDK code -> CloudFormation template -> CloudFormation stack -> AWS resources
That leads to three practical rules:
cdk synthorpython app.pychecks whether valid templates can be producedcdk deployasks CloudFormation to create or update resourcescdk destroyasks CloudFormation to delete resources
So when deployment fails:
- CDK output is only the surface symptom
- CloudFormation events are the real source of truth
8. CloudFormation Basics You Need for This Repo
A CloudFormation stack is a managed unit of infrastructure created from a template.
In this project:
StorageStackis one CloudFormation stackLambdaStackis another CloudFormation stack
Important stack lifecycle states:
CREATE_IN_PROGRESSCREATE_COMPLETEUPDATE_IN_PROGRESSUPDATE_COMPLETEUPDATE_ROLLBACK_IN_PROGRESSUPDATE_ROLLBACK_FAILEDDELETE_IN_PROGRESSDELETE_FAILED
Why this matters:
- CDK is not directly creating resources by itself
- CloudFormation controls the actual stack lifecycle
- if rollback fails, the stack can get stuck and block normal CDK operations
9. Bootstrap, Synth, Deploy, and Destroy
9.1 cdk bootstrap
Bootstrap prepares one AWS account and one region for CDK deployment.
Run it when:
- you use a new AWS account
- you use a new region
- bootstrap resources were deleted
Command used in this repo:
cdk bootstrap --app "./.venv/bin/python app.py"9.2 cdk synth
This checks whether the CDK app can generate valid templates:
cdk synthis not just a syntax check — it executes the CDK app and synthesizes it into CloudFormation templates.
cdk synth --app "./.venv/bin/python app.py"cdk synth:
- runs the CDK app,
- converts the stacks into CloudFormation templates,
- writes the synthesized output to
cdk.out, - and may also print the template to the terminal.
Just running cdk synth will not create anything in AWS.
9.3 cdk deploy
Deploy command:
cdk deploy --all --app "./.venv/bin/python app.py"CloudFormation will then either:
- keep resources unchanged
- update resources in place
- replace resources if the change requires replacement
9.4 cdk destroy
Normal cleanup command in this repo:
AWS_PROFILE=alan-admin bash ./scripts/destroy_app.sh us-east-1This is the preferred normal cleanup path when the stacks are healthy.
10. Recommended Working Session
10.1 Typical local workflow
source .venv/bin/activate
export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
aws sts get-caller-identity
./.venv/bin/python app.py10.2 Typical deploy workflow
source .venv/bin/activate
export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
cdk bootstrap --app "./.venv/bin/python app.py"
cdk deploy --all --app "./.venv/bin/python app.py"10.3 Typical validation checklist
After deployment, verify:
StorageStackandLambdaStackexist in CloudFormation- both S3 buckets exist
- DynamoDB contains history records
- the plotting API returns HTTP 200
- the plot object exists in the plot bucket
10.4 Demo checklist
For the assignment demo, you should be able to show:
- the repo at the required commit
- successful CDK deployment
- both stacks in CloudFormation
- deployed AWS resources
- manual invocation of the driver Lambda
- DynamoDB history records
- the generated plot object
11. Stateful Resources and Replacement Risk
Some infrastructure changes can be applied in place.
Other changes force CloudFormation to replace the resource:
- create a new resource
- shift dependencies
- delete the old resource
This is resource replacement.
Stateful resources are risky because they hold data.
In this project, the important stateful resources are:
- S3 buckets
- DynamoDB tables
The current project is especially cleanup-friendly and destructive because it uses:
RemovalPolicy.DESTROYauto_delete_objects=Trueon buckets
That means:
- deleting a bucket can delete all objects in it
- deleting the table removes stored history
- replacing a stateful resource can also remove old data
Practical rule:
- before changing stateful infrastructure, ask whether the change is in-place or replacement
- for this assignment, replacement is acceptable for demo data
- for anything important, replacement is not acceptable without a data migration plan
12. Custom Resources and Stack Failure Recovery
This project can involve CDK-managed custom resources such as:
Custom::S3BucketNotifications
These matter because a stack can fail in ways that are not obvious from top-level CDK output.
One common failure pattern is:
- an S3 bucket is replaced or deleted
- the notification custom resource still runs during cleanup
- it tries to update notifications on a bucket that no longer exists
- AWS returns
NoSuchBucket - the stack gets stuck in
DELETE_FAILEDor rollback failure
If cdk destroy is no longer enough:
- stop repeating the same CDK command
- check CloudFormation events
- find the exact failing logical resource
- recover through CloudFormation directly
Example recovery command:
AWS_PROFILE=alan-admin aws cloudformation delete-stack \
--stack-name StorageStack \
--region us-east-1 \
--retain-resources DataBucketNotifications11EB1C2E \
--deletion-mode FORCE_DELETE_STACKThen wait:
AWS_PROFILE=alan-admin aws cloudformation wait stack-delete-complete \
--stack-name StorageStack \
--region us-east-1This is a recovery path only, not the normal workflow.
13. One-Page Summary
- Assignment 3 is Assignment 2 behavior packaged as CDK-managed infrastructure.
- CDK is the authoring layer; CloudFormation is the execution layer.
- The app uses two stacks:
StorageStackfor stateful storage resourcesLambdaStackfor compute, API, and integration resources
- The normal lifecycle is:
- verify AWS account and region
- initialize environment
- run synth
- bootstrap if needed
- deploy
- validate
- destroy when finished
- Avoid hardcoded physical names for deployable AWS resources.
- Treat stateful resource replacement as potentially destructive.
- If a stack is badly stuck, debug at the CloudFormation level instead of treating CDK output as the root cause.