Compression Valleys

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Oct 26 23:38
Editor
Edited
Edited
2025 Oct 26 23:40
Refs
Refs
Sharp drop in representation dimension and entropy in middle layers
 
 
 
 
 
The sharp drop in representation dimension and entropy in the middle layers arises from Attention Sink, i.e., Massive Activation in the Residual Stream. In other words, rank collapses to near 1 → representation compression (entropy decrease) occurs
 

Recommendations