KV Cache Compression uses KV cache like external memory similar to RAG, where KV caches are compressed and accumulated offline by performing inference once for external documents per task. This allows for more practical use of native embeddings in online settings compared to RAG with minimum online warmup. While document scaling may be challenging for global tasks with this less scalable method, it's an innovative that can be particularly useful in specific industry domains.
KV Cache Compression
Created
Created
2025 Mar 13 16:19Creator
Creator
Seonglae ChoEditor
Editor
Seonglae ChoEdited
Edited
2025 Mar 13 16:25Refs
Refs
KV Cache