It’s not “background monitoring.” It’s one HTTP request that keeps the connection open.
The server yields chunks (e.g., yield "..."), and the client can receive and render those chunks as soon as they arrive.
The stream ends when the generator finishes (no more chunks) or when the client disconnects/aborts, then the connection closes.
Ensure proxies don’t buffer streaming responses, otherwise chunks won’t reach the client in real time.
Streaming mainly reduces TTFB (Time To First Byte) and perceived latency
Two streaming way
Chunked text streaming (most common)
Use media_type="text/plain" or application/octet-stream.
The server yields chunks (e.g., yield "..."), and the client uses fetch to read the response body as a stream (response.body).
SSE (Server-Sent Events)
Use media_type="text/event-stream".
Each yield must follow SSE format: data: xxx\n\n.
Great for event-like updates (progress, tokens, status). However, the browser’s native EventSource is mainly GET-only, so it’s less convenient when you need a POST with a complex request body.