Example: Project manager system
Publish ProjectManager domain events to the company’s Kafka-based event streaming platform.
- We use
project_idas the message key so that all state-change events for the same project go to the same partition, preserving ordering per partition. - On the producer side, we configure
acks=allwith retries and an idempotent producer to improve Kafka Durability and prevent duplicate writes caused by retries. - On the consumer side, we follow a process-success-then-commit strategy to achieve at-least-once delivery, and we enforce idempotency at the database layer using a unique constraint on
event_idto avoid duplicate side effects. - We also version our event schemas to support schema evolution, and we monitor consumer lag, retries, and a DLQ to ensure observability and recoverability.
Improvement:
- Before (synchronous chain, tightly coupled):
- Create Project → triggers multiple blocking API calls:
- ProjectService: write DB
- ContractService: create the contract
- NotificationService: email stakeholders
- Downside: one slow/failing downstream service can delay or fail the whole request.
- Create Project → triggers multiple blocking API calls:
- After (event-driven with Kafka, at-least-once consumption):
- ProjectService create project → publishes a
ProjectCreatedevent to Kafka - Downstream:
- ContractService: create the contract
- idempotent handler / dedupe by eventId
- NotificationService: email stakeholders
- idempotent: avoid double-send via eventId/log table
- ContractService: create the contract
- Benefit: shorter critical path, less coupling, better resilience, easier to extend (add new consumers without changing ProjectService).
- ProjectService create project → publishes a
Kafka Benefits:
- Decoupling (loose coupling between services)
- Producers publish events without knowing who consumes them.
- New downstream features can be added by subscribing to the topic—no changes to the producer API.
- Resilience (fault isolation & recovery)
- If a downstream service is slow or temporarily down, events remain in Kafka and can be processed after recovery.
- Failures are isolated: one consumer’s outage doesn’t take down the whole request path.
- Buffering (traffic smoothing & backpressure)
- Kafka acts as a durable buffer during traffic spikes, protecting databases and external dependencies.
- Consumers can scale horizontally and process at their own pace, reducing overload and stabilizing latency.