Layer-wise changes: early=context structure, middle=semantics, late=output control
Lower layers use composition for low-level features, deeper layers exploit superposition by packing polysemantic features into residual bandwidth, and output layers recompose information, making it interpretable by a linear head.
While this is a widely known insight, it's difficult to find clear proof or the original paper that first proposed it

Seonglae Cho