Kafka

  • Event streaming platform
  • a distributed commit log used for event streaming
    • producers append events to topic partitions
    • consumers read them as streams using offsets
Producer(writes) → Kafka(store & repl) → Consumer(reads,process, commit offsets)

Concepts

  • Topic
    • A topic is the name of a category of messages, like a channel.
    • e.g., project-events
  • Message/Record
    • Topic 里的一条事件数据。
    • e.g., PROJECT_APPROVED 事件,payload 里有 projectId、budget、approvedBy…
  • Partition
    • Topic 会被切成多个 Partition(分片)
      • 并行:多个 partition 可以被多个 consumer 并行消费
      • 顺序:Kafka 只保证同一个 partition 内的顺序
  • Offset
    • Offset = 消息在某个 partition 里的编号(从 0 开始递增)。
  • Broker
    • Kafka 集群里的一台服务器就叫一个 broker
  • Replica(副本)
    • 每个 partition 来说:
      • Leader:主副本,对外处理写入(producer)和读取(consumer)
      • Follower:从副本,复制 leader 的数据(备份)
    • 每个 partition 可以有多个副本,分布在不同 broker 上,防止机器挂了数据丢。
  • ISR(In-Sync Replicas,同步副本集合)✅ 核心
    • ISR = 跟得上 leader 的副本集合
    • 只有在 ISR 里的 follower 才算“同步”。
    • 写入时如果配置 acks=all,Kafka 会等 ISR 都写好才算成功(更耐崩)
  • Producer / Consumer
    • Producer:发消息的人(ProjectService 发事件)
    • Consumer:收消息的人(ContractService 订阅事件创建合同)
  • Consumer Group(消费者组)✅ 核心
    • Kafka 用它来实现“一组机器协作消费”。
    • 规则:
      • 同一个 consumer group 内:一个 partition 同时只会分配给一个 consumer
      • 这样可以并行消费,同时不重复处理(在同组内)
    • Example:ContractService 开 3 个实例,属于同一个 group,它们会分着读不同 partition。
  • Rebalance
    • 当 consumer 增减、崩溃、扩容时,Kafka 会重新分配 partition 给 group 内的消费者,这个过程叫 rebalance。
  • Delivery (At-most-once / At-least-once / Exactly-once)
    • At-most-once:可能丢,不重复(很少用)
    • At-least-once:不丢但可能重复(最常见)
    • Exactly-once:尽量做到不丢不重(更复杂)

Delivery semantics

  • We first choose the desired delivery semantics (e.g., at-least-once).

  • Then we implement it by combining

    • producer settings (acks/retries/idempotence) and
    • consumer offset commit strategy (commit after processing)
    • plus idempotent downstream writes to tolerate duplicates.
  • Delivery models

    • At-most-once (commit → process)
      • processed 0 or 1 time (no duplicates), but can lose messages.
      • How it happens: consumer commits offset before processing (or drops on error).
      • Use when: losing an event is acceptable (rare), or you prioritize latency over correctness.
    • At-least-once (process → commit) - most common
      • processed 1 or more times (no loss), but duplicates are possible.
      • How it happens: consumer processes first, then commits offset; if it crashes after processing but before commit → it reprocesses.
      • Use when: correctness matters; handle duplicates via idempotency/dedup.
    • Exactly-once (process + commit + produce)
      • processed exactly 1 time (no loss, no duplicates) from the system’s point of view.
      • In Kafka: achievable with idempotent producer + transactions and careful consumer/processing design (often summarized as “EOS”).
      • Use when: duplicates are very costly and you can afford complexity (payments-like workflows, some stream processing).

Producer side and Consumer side

  • Producer Side:

    • acks:
      • acks=0 (fastest, highest risk of loss)
        1. Producer sends the message
        2. Producer does not wait for any response and treats it as success
      • acks=1 (leader only, good latency)
        1. Producer sends the message to the partition leader
        2. Leader appends the record to its log
        3. Leader replies OK immediately
      • acks=all/acks=-1 (strongest durability, higher latency)
        1. Producer sends the message to the leader
        2. Leader appends to its log
        3. Leader waits for acknowledgements from in-sync replicas (ISR)
        4. Once the number of in-sync replicas that have written the record meets min.insync.replicas,
        5. Leader replies OK
    • retries (retry on failure)
    • idempotent producer
      • "enable.idempotence": True
  • Consumer Side

    • poll → process → commit offset (depend on delivery models)
  • At-least-once

    • Producer:
      • Set acks=all to improve durability
      • Enable retries to survive transient network/broker failures (avoid dropping messages).
      • Enable idempotent producer
    • Consumer:
      • Commit offsets only after processing succeeds(core)
      • Make downstream writes idempotent