Streams input tokens via AsyncGenerator within a single request while simultaneously starting output generation (intra-request optimization)
Traditional flow: complete prompt submission → prefill → decode. With streaming requests, prefill begins on arrived tokens even before the entire input is received, and decode starts immediately upon input completion. This is the key to reducing latency
Works orthogonally with In-Flight Batching

Seonglae Cho