Lower layers use composition for low-level features, deeper layers exploit superposition by packing polysemantic features into residual bandwidth, and output layers recompose information, making it interpretable by a linear head.
While this is a widely known insight, it's difficult to find clear proof or the original paper that first proposed it
Distributed Representations: Composition & Superposition
Distributed representations are a classic idea in both neuroscience and connectionist approaches to AI. We're often asked how our work on superposition relates to it. Since publishing our original paper on superposition, we've had more time to reflect on the relationship between the topics and discuss it with people, and wanted to expand on our earlier discussion in the related work section and share a few thoughts. (We care a lot about superposition and the structure of distributed representations because decomposing representations into independent components is necessary to escape the curse of dimensionality and understand neural networks.)
https://transformer-circuits.pub/2023/superposition-composition/index.html

Seonglae Cho