Example: Project manager system

Publish ProjectManager domain events to the company’s Kafka-based event streaming platform.

  • We use project_id as the message key so that all state-change events for the same project go to the same partition, preserving ordering per partition.
  • On the producer side, we configure acks=all with retries and an idempotent producer to improve Kafka Durability and prevent duplicate writes caused by retries.
  • On the consumer side, we follow a process-success-then-commit strategy to achieve at-least-once delivery, and we enforce idempotency at the database layer using a unique constraint on event_id to avoid duplicate side effects.
  • We also version our event schemas to support schema evolution, and we monitor consumer lag, retries, and a DLQ to ensure observability and recoverability.

Improvement:

  • Before (synchronous chain, tightly coupled):
    • Create Project → triggers multiple blocking API calls:
      • ProjectService: write DB
      • ContractService: create the contract
      • NotificationService: email stakeholders
    • Downside: one slow/failing downstream service can delay or fail the whole request.
  • After (event-driven with Kafka, at-least-once consumption):
    • ProjectService create project → publishes a ProjectCreated event to Kafka
    • Downstream:
      • ContractService: create the contract
        • idempotent handler / dedupe by eventId
      • NotificationService: email stakeholders
        • idempotent: avoid double-send via eventId/log table
    • Benefit: shorter critical path, less coupling, better resilience, easier to extend (add new consumers without changing ProjectService).

Kafka Benefits:

  • Decoupling (loose coupling between services)
    • Producers publish events without knowing who consumes them.
    • New downstream features can be added by subscribing to the topic—no changes to the producer API.
  • Resilience (fault isolation & recovery)
    • If a downstream service is slow or temporarily down, events remain in Kafka and can be processed after recovery.
    • Failures are isolated: one consumer’s outage doesn’t take down the whole request path.
  • Buffering (traffic smoothing & backpressure)
    • Kafka acts as a durable buffer during traffic spikes, protecting databases and external dependencies.
    • Consumers can scale horizontally and process at their own pace, reducing overload and stabilizing latency.