PyramidKV

Creator
Creator
Seonglae Cho
Created
Created
2024 Oct 26 14:4
Editor
Edited
Edited
2024 Oct 26 14:13
Refs
Refs

More cache in lower layers, less in higher layers

LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately focusing on critical tokens
Motivated by these insights, PyramidKV dynamically adjusts the KV cache size across different layers, allocating more cache in lower layers and less in higher ones.
Alpha (α) is a hyperparameter that defines the number of last tokens retained across all layers, as they hold recent, crucial information. Intermediate cache sizes then follow an
Arithmetic sequence
to form a pyramid, optimizing memory allocation per layer.
 
 
 
 
 

Backlinks

KV Cache

Recommendations