00.Overview

Kafka

Event streaming platform
a distributed commit log used for event streaming
- producers append events to topic partitions
- consumers read them as streams using offsets

Producer(writes) → Kafka(store & repl) → Consumer(reads,process, commit offsets)

Concepts

Topic
- A topic is the name of a category of messages, like a channel.
- e.g., project-events
Message/Record
- Topic 里的一条事件数据。
- e.g., PROJECT_APPROVED 事件，payload 里有 projectId、budget、approvedBy…
Partition
- Topic 会被切成多个 Partition（分片)
  - 并行：多个 partition 可以被多个 consumer 并行消费
  - 顺序：Kafka 只保证同一个 partition 内的顺序
Offset
- Offset = 消息在某个 partition 里的编号（从 0 开始递增）。
Broker
- Kafka 集群里的一台服务器就叫一个 broker。
Replica（副本）
- 对 每个 partition 来说：
  - Leader：主副本，对外处理写入（producer）和读取（consumer）
  - Follower：从副本，复制 leader 的数据（备份）
- 每个 partition 可以有多个副本，分布在不同 broker 上，防止机器挂了数据丢。
ISR（In-Sync Replicas，同步副本集合）✅ 核心
- ISR = 跟得上 leader 的副本集合。
- 只有在 ISR 里的 follower 才算“同步”。
- 写入时如果配置 acks=all，Kafka 会等 ISR 都写好才算成功（更耐崩）
Producer / Consumer
- Producer：发消息的人（ProjectService 发事件）
- Consumer：收消息的人（ContractService 订阅事件创建合同）
Consumer Group（消费者组）✅ 核心
- Kafka 用它来实现“一组机器协作消费”。
- 规则：
  - 同一个 consumer group 内：一个 partition 同时只会分配给一个 consumer
  - 这样可以并行消费，同时不重复处理（在同组内）
- Example：ContractService 开 3 个实例，属于同一个 group，它们会分着读不同 partition。
Rebalance
- 当 consumer 增减、崩溃、扩容时，Kafka 会重新分配 partition 给 group 内的消费者，这个过程叫 rebalance。
Delivery (At-most-once / At-least-once / Exactly-once)
- At-most-once：可能丢，不重复（很少用）
- At-least-once：不丢但可能重复（最常见）
- Exactly-once：尽量做到不丢不重（更复杂）

Delivery semantics

We first choose the desired delivery semantics (e.g., at-least-once).
Then we implement it by combining
- producer settings (acks/retries/idempotence) and
- consumer offset commit strategy (commit after processing)
- plus idempotent downstream writes to tolerate duplicates.
Delivery models
- At-most-once (commit → process)
  - processed 0 or 1 time (no duplicates), but can lose messages.
  - How it happens: consumer commits offset before processing (or drops on error).
  - Use when: losing an event is acceptable (rare), or you prioritize latency over correctness.
- At-least-once (process → commit) - most common
  - processed 1 or more times (no loss), but duplicates are possible.
  - How it happens: consumer processes first, then commits offset; if it crashes after processing but before commit → it reprocesses.
  - Use when: correctness matters; handle duplicates via idempotency/dedup.
- Exactly-once (process + commit + produce)
  - processed exactly 1 time (no loss, no duplicates) from the system’s point of view.
  - In Kafka: achievable with idempotent producer + transactions and careful consumer/processing design (often summarized as “EOS”).
  - Use when: duplicates are very costly and you can afford complexity (payments-like workflows, some stream processing).

Producer side and Consumer side

Producer Side:
- acks:
  - acks=0 (fastest, highest risk of loss)
    1. Producer sends the message
    2. Producer does not wait for any response and treats it as success
  - acks=1 (leader only, good latency)
    1. Producer sends the message to the partition leader
    2. Leader appends the record to its log
    3. Leader replies OK immediately
  - acks=all/acks=-1 (strongest durability, higher latency)
    1. Producer sends the message to the leader
    2. Leader appends to its log
    3. Leader waits for acknowledgements from in-sync replicas (ISR)
    4. Once the number of in-sync replicas that have written the record meets min.insync.replicas,
    5. Leader replies OK
- retries (retry on failure)
- idempotent producer
  - "enable.idempotence": True
Consumer Side
- poll → process → commit offset (depend on delivery models)
At-least-once
- Producer:
  - Set acks=all to improve durability
  - Enable retries to survive transient network/broker failures (avoid dropping messages).
  - Enable idempotent producer
- Consumer:
  - Commit offsets only after processing succeeds（core）
  - Make downstream writes idempotent

Notes

Explorer

00.Overview

Kafka

Concepts

Delivery semantics

Producer side and Consumer side

Table of Contents

Graph View

Table of Contents