Sharp drop in representation dimension and entropy in middle layers
The sharp drop in representation dimension and entropy in the middle layers arises from Attention Sink, i.e., Massive Activation in the Residual Stream. In other words, rank collapses to near 1 → representation compression (entropy decrease) occurs

Seonglae Cho