Compression Valleys

Sharp drop in representation dimension and entropy in middle layers

The sharp drop in representation dimension and entropy in the middle layers arises from Attention Sink, i.e., Massive Activation in the Residual Stream. In other words, rank collapses to near 1 → representation compression (entropy decrease) occurs

Attention Sinks and Compression Valleys in LLMs are Two Sides of...

Attention sinks and compression valleys have attracted significant attention as two puzzling phenomena in large language models, but have been studied in isolation. In this work, we present a...

https://arxiv.org/abs/2510.06477

Compression Valleys

Recommendations