Reference: Assignment Requirements

This assignment implements a Lambda-based, event-driven system that

  • monitors changes in an S3 bucket,
  • records object size history in DynamoDB,
  • and generates visualizations of the tracked data.

Quick Takeaway

This assignment is a two-stack AWS CDK application that defines and deploys cloud infrastructure using AWS CDK and CloudFormation.

  • It provisions an S3 bucket and a Lambda-based size-tracking service that records object size history.
  • By using infrastructure as code, it enables automated deployment, reproducible environments, and simplified infrastructure lifecycle management.

The end-to-end flow is:
S3 object change -> EventBridge -> size_tracking Lambda -> DynamoDB -> plotting Lambda -> plot bucket

The practical workflow is:

  1. verify AWS account and region
  2. initialize or open the CDK project
  3. create the Python virtual environment
  4. run local synth checks
  5. bootstrap if needed
  6. deploy both stacks
  7. run the demo flow and validate outputs
  8. destroy or recover stacks when needed

Python CDK code CloudFormation template AWS resources


1. What is the purpose of Assignment?

Assignment 3 keeps the Assignment 2 application behavior, but replaces manual AWS Console setup with AWS CDK.

That means:

  • the Lambda business logic still follows the same core workflow
  • infrastructure is now defined in Python CDK code
  • deployment, update, and cleanup should go through CDK and CloudFormation

The system still needs the same functional behavior:

  • S3 object changes happen in the tracked bucket
  • a size-tracking Lambda records bucket state into DynamoDB
  • a plotting Lambda generates a chart from recent history
  • a driver Lambda performs the demo sequence and triggers plotting

The driver sequence remains:

  1. create assignment1.txt with Empty Assignment 1
  2. update assignment1.txt to Empty Assignment 2222222222
  3. delete assignment1.txt
  4. create assignment2.txt with 33
  5. call the plotting API

2. Architecture and Project Structure

2.1 Stack Design

This project currently uses two stacks:

  • StorageStack: creates the stateful resources first
  • LambdaStack: consumes them

2.2 Resource Ownership

StorageStack

Creates:

  • one data s3 bucket
  • one plot s3 bucket
  • one DynamoDB table
  • one GSI named GSI_SizeByBucket

LambdaStack

Creates:

  • size-tracking Lambda
  • plotting Lambda
  • driver Lambda
  • API Gateway REST API
  • EventBridge rule
  • CloudWatch log groups
  • IAM permissions

2.3 Lambda Responsibilities

size_tracking_lambda.py

  • lists objects in the tracked bucket
  • computes object_cnt and total_size
  • writes a history record to DynamoDB

plotting_lambda.py

  • reads recent history from DynamoDB
  • reads the historical maximum from the GSI
  • generates a matplotlib plot
  • uploads the plot to the plot bucket

driver_lambda.py

  • performs the demo file sequence
  • triggers the plotting API

Expected visible history: 0 -> 18 -> 27 -> 0 -> 2

2.4 Project Layout

assignment3_project/
├── app.py
├── cdk.json
├── README.md
├── requirements.txt
├── assignment3_app/
│   ├── __init__.py
│   ├── storage_stack.py
│   └── lambda_stack.py
├── lambdas/
│   ├── size_tracking_lambda.py
│   ├── plotting_lambda.py
│   └── driver_lambda.py
├── scripts/
│   └── destroy_app.sh
└── tests/

3. Resource-Level Expectations

3.1 Data Model

The DynamoDB table stores history by bucket over time.

Current schema:

  • partition key: bucket_name
  • sort key: time

Current GSI:

  • name: GSI_SizeByBucket
  • partition key: bucket_name
  • sort key: total_size

This supports:

  • recent-history queries from the main table
  • historical-maximum queries from the GSI without using scan

3.2 Why Two Buckets Are Reasonable

The assignment is sometimes described as if it only needs one bucket, but this implementation uses:

  • one data bucket for tracked object activity
  • one plot bucket for generated image output

That is still a reasonable interpretation of the requirement because the buckets have different roles.

3.3 Naming Rule

The important naming requirement is to avoid hardcoded physical names for deployable AWS resources.

In this repo, the practical rule is:

  • do not hardcode S3 bucket names
  • do not hardcode Lambda function names
  • do not hardcode CloudWatch LogGroup names
  • let CDK generate physical names automatically

Using a fixed internal identifier such as the DynamoDB GSI name is acceptable because it is part of the table schema and can be passed into Lambda code through environment variables.


4. CDK Project Initialization

cdk init creates the starting scaffold for a new CDK project.

For a Python app, it typically generates:

  • app.py
  • cdk.json
  • requirements.txt
  • the initial stack file

Use:

cdk init app --language python

Important:

  • the command is cdk
  • app is the template type
  • --language python tells CDK to generate a Python CDK project

Run it inside a new empty directory:

mkdir my_cdk_project
cd my_cdk_project
cdk init app --language python

If the directory is not empty, CDK may refuse to initialize it.

Common init errors

cdk: command not found

Cause:

  • CDK CLI is not installed or not on PATH

Fix:

npm install -g aws-cdk
cdk --version

Init fails in a non-empty directory

Cause:

  • the target directory already contains files

Fix:

  • use a new empty folder
  • or intentionally clean the folder first

5. Local Python Setup

From the project root:

python3 -m venv .venv
source .venv/bin/activate
./.venv/bin/python -m pip install -r requirements.txt

Then run a local synth-style check:

./.venv/bin/python app.py

This is useful because it verifies:

  • imports work
  • stack wiring works
  • CDK can synthesize templates

6. AWS Account and Credential Setup

Before any deploy or destroy, verify both the active AWS account and the active region.

The main AWS CLI config files are:

~/.aws/credentials
~/.aws/config

Check the current shell:

echo $AWS_PROFILE
echo $AWS_DEFAULT_REGION

Best verification commands:

aws configure list
aws sts get-caller-identity
env | rg '^AWS_'

For this project, the normal shell setup is:

export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
aws sts get-caller-identity

This matters because:

  • the repo commands assume alan-admin
  • the matplotlib layer ARN targets us-east-1

If you want to clear the explicit settings:

unset AWS_PROFILE
unset AWS_DEFAULT_REGION
unset CDK_DEFAULT_REGION

Then verify again with:

aws configure list
aws sts get-caller-identity

7. How CDK and CloudFormation Work Together

CDK does not replace CloudFormation.

The relationship is:

  • CDK = authoring layer
  • CloudFormation = execution layer

The actual flow is:
Python CDK code -> CloudFormation template -> CloudFormation stack -> AWS resources

That leads to three practical rules:

  • cdk synth or python app.py checks whether valid templates can be produced
  • cdk deploy asks CloudFormation to create or update resources
  • cdk destroy asks CloudFormation to delete resources

So when deployment fails:

  • CDK output is only the surface symptom
  • CloudFormation events are the real source of truth

8. CloudFormation Basics You Need for This Repo

A CloudFormation stack is a managed unit of infrastructure created from a template.

In this project:

  • StorageStack is one CloudFormation stack
  • LambdaStack is another CloudFormation stack

Important stack lifecycle states:

  • CREATE_IN_PROGRESS
  • CREATE_COMPLETE
  • UPDATE_IN_PROGRESS
  • UPDATE_COMPLETE
  • UPDATE_ROLLBACK_IN_PROGRESS
  • UPDATE_ROLLBACK_FAILED
  • DELETE_IN_PROGRESS
  • DELETE_FAILED

Why this matters:

  • CDK is not directly creating resources by itself
  • CloudFormation controls the actual stack lifecycle
  • if rollback fails, the stack can get stuck and block normal CDK operations

9. Bootstrap, Synth, Deploy, and Destroy

9.1 cdk bootstrap

Bootstrap prepares one AWS account and one region for CDK deployment.

Run it when:

  • you use a new AWS account
  • you use a new region
  • bootstrap resources were deleted

Command used in this repo:

cdk bootstrap --app "./.venv/bin/python app.py"

9.2 cdk synth

This checks whether the CDK app can generate valid templates:

  • cdk synth is not just a syntax check — it executes the CDK app and synthesizes it into CloudFormation templates.
cdk synth --app "./.venv/bin/python app.py"

cdk synth:

  • runs the CDK app,
  • converts the stacks into CloudFormation templates,
  • writes the synthesized output to cdk.out,
  • and may also print the template to the terminal.

Just running cdk synth will not create anything in AWS.

9.3 cdk deploy

Deploy command:

cdk deploy --all --app "./.venv/bin/python app.py"

CloudFormation will then either:

  • keep resources unchanged
  • update resources in place
  • replace resources if the change requires replacement

9.4 cdk destroy

Normal cleanup command in this repo:

AWS_PROFILE=alan-admin bash ./scripts/destroy_app.sh us-east-1

This is the preferred normal cleanup path when the stacks are healthy.


10. Recommended Working Session

10.1 Typical local workflow

source .venv/bin/activate
export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
aws sts get-caller-identity
./.venv/bin/python app.py

10.2 Typical deploy workflow

source .venv/bin/activate
export AWS_PROFILE=alan-admin
export AWS_DEFAULT_REGION=us-east-1
export CDK_DEFAULT_REGION=us-east-1
cdk bootstrap --app "./.venv/bin/python app.py"
cdk deploy --all --app "./.venv/bin/python app.py"

10.3 Typical validation checklist

After deployment, verify:

  1. StorageStack and LambdaStack exist in CloudFormation
  2. both S3 buckets exist
  3. DynamoDB contains history records
  4. the plotting API returns HTTP 200
  5. the plot object exists in the plot bucket

10.4 Demo checklist

For the assignment demo, you should be able to show:

  1. the repo at the required commit
  2. successful CDK deployment
  3. both stacks in CloudFormation
  4. deployed AWS resources
  5. manual invocation of the driver Lambda
  6. DynamoDB history records
  7. the generated plot object

11. Stateful Resources and Replacement Risk

Some infrastructure changes can be applied in place.

Other changes force CloudFormation to replace the resource:

  1. create a new resource
  2. shift dependencies
  3. delete the old resource

This is resource replacement.

Stateful resources are risky because they hold data.

In this project, the important stateful resources are:

  • S3 buckets
  • DynamoDB tables

The current project is especially cleanup-friendly and destructive because it uses:

  • RemovalPolicy.DESTROY
  • auto_delete_objects=True on buckets

That means:

  • deleting a bucket can delete all objects in it
  • deleting the table removes stored history
  • replacing a stateful resource can also remove old data

Practical rule:

  • before changing stateful infrastructure, ask whether the change is in-place or replacement
  • for this assignment, replacement is acceptable for demo data
  • for anything important, replacement is not acceptable without a data migration plan

12. Custom Resources and Stack Failure Recovery

This project can involve CDK-managed custom resources such as:

  • Custom::S3BucketNotifications

These matter because a stack can fail in ways that are not obvious from top-level CDK output.

One common failure pattern is:

  1. an S3 bucket is replaced or deleted
  2. the notification custom resource still runs during cleanup
  3. it tries to update notifications on a bucket that no longer exists
  4. AWS returns NoSuchBucket
  5. the stack gets stuck in DELETE_FAILED or rollback failure

If cdk destroy is no longer enough:

  • stop repeating the same CDK command
  • check CloudFormation events
  • find the exact failing logical resource
  • recover through CloudFormation directly

Example recovery command:

AWS_PROFILE=alan-admin aws cloudformation delete-stack \
  --stack-name StorageStack \
  --region us-east-1 \
  --retain-resources DataBucketNotifications11EB1C2E \
  --deletion-mode FORCE_DELETE_STACK

Then wait:

AWS_PROFILE=alan-admin aws cloudformation wait stack-delete-complete \
  --stack-name StorageStack \
  --region us-east-1

This is a recovery path only, not the normal workflow.


13. One-Page Summary

  • Assignment 3 is Assignment 2 behavior packaged as CDK-managed infrastructure.
  • CDK is the authoring layer; CloudFormation is the execution layer.
  • The app uses two stacks:
    • StorageStack for stateful storage resources
    • LambdaStack for compute, API, and integration resources
  • The normal lifecycle is:
    • verify AWS account and region
    • initialize environment
    • run synth
    • bootstrap if needed
    • deploy
    • validate
    • destroy when finished
  • Avoid hardcoded physical names for deployable AWS resources.
  • Treat stateful resource replacement as potentially destructive.
  • If a stack is badly stuck, debug at the CloudFormation level instead of treating CDK output as the root cause.