Compression Valleys

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Oct 26 23:38
Editor
Edited
Edited
2025 Oct 26 23:40
Refs
Refs
Sharp drop in representation dimension and entropy in middle layers
 
 
 
 
 
The sharp drop in representation dimension and entropy in the middle layers arises from Attention Sink, i.e., Massive Activation in the Residual Stream. In other words, rank collapses to near 1 → representation compression (entropy decrease) occurs
Attention Sinks and Compression Valleys in LLMs are Two Sides of...
Attention sinks and compression valleys have attracted significant attention as two puzzling phenomena in large language models, but have been studied in isolation. In this work, we present a...
Attention Sinks and Compression Valleys in LLMs are Two Sides of...
 

Recommendations