In-Flight Batching

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Mar 15 23:15
Editor
Edited
Edited
2026 Mar 24 14:13
Refs
Refs
A method that dynamically batches in-flight requests with newly incoming requests to process them together.
  • Complex implementation
    • Different sequence lengths
    • KV cache management required
  • Scheduling challenges
    • Fairness vs throughput tradeoff
  • Memory fragmentation possible

Used by

 
 
 
 

Recommendations