Sparse Probing

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 1 14:54
Editor
Edited
Edited
2025 Aug 5 17:44
Refs
Refs
 
 
Early layers are the key point of the
Superposition Hypothesis
. As models grow larger, representational sparsity generally increases. In early layers, polysemantic neurons are combined in sparse combinations to detokenize n-grams/compound words (e.g., "social security") using superposition. In middle layers, there exist effectively monosemantic neurons for contextual features (language, code type, data source, etc.). When these neurons are ablated in their relevant context, loss increases.
 
 

Recommendations