Sharp drop in representation dimension and entropy in middle layers
The sharp drop in representation dimension and entropy in the middle layers arises from Attention Sink, i.e., Massive Activation in the Residual Stream. In other words, rank collapses to near 1 → representation compression (entropy decrease) occurs
Attention Sinks and Compression Valleys in LLMs are Two Sides of...
Attention sinks and compression valleys have attracted significant attention as two puzzling phenomena in large language models, but have been studied in isolation. In this work, we present a...
https://arxiv.org/abs/2510.06477


Seonglae Cho