Early layers are the key point of the Superposition Hypothesis. As models grow larger, representational sparsity generally increases. In early layers, polysemantic neurons are combined in sparse combinations to detokenize n-grams/compound words (e.g., "social security") using superposition. In middle layers, there exist effectively monosemantic neurons for contextual features (language, code type, data source, etc.). When these neurons are ablated in their relevant context, loss increases.
Sparse Probing
Creator
Creator

Created
Created
2025 Jul 1 14:54Editor
Editor

Edited
Edited
2025 Aug 5 17:44Refs
Refs