KV Cache Compression

Created
Created
2025 Mar 13 16:19
Creator
Creator
Seonglae Cho
Editor
Edited
Edited
2025 Mar 13 16:25
Refs
Refs
KV Cache
KV Cache Compression uses KV cache like external memory similar to RAG, where KV caches are compressed and accumulated offline by performing inference once for external documents per task. This allows for more practical use of native embeddings in online settings compared to RAG with minimum online warmup. While document scaling may be challenging for global tasks with this less scalable method, it's an innovative that can be particularly useful in specific industry domains.
notion image
 
 
 
 
 
 
 

Recommendations