Early layers are the key point of the Superposition Hypothesis. As models grow larger, representational sparsity generally increases. In early layers, polysemantic neurons are combined in sparse combinations to detokenize n-grams/compound words (e.g., "social security") using superposition. In middle layers, there exist effectively monosemantic neurons for contextual features (language, code type, data source, etc.). When these neurons are ablated in their relevant context, loss increases.
Sparse Probing
Creator
Creator
Seonglae ChoCreated
Created
2025 Jul 1 14:54Editor
Editor
Seonglae ChoEdited
Edited
2025 Aug 5 17:44Refs
Refs
