Sparse Probing

Early layers are the key point of the

Superposition Hypothesis. As models grow larger, representational sparsity generally increases. In early layers, polysemantic neurons are combined in sparse combinations to detokenize n-grams/compound words (e.g., "social security") using superposition. In middle layers, there exist effectively monosemantic neurons for contextual features (language, code type, data source, etc.). When these neurons are ablated in their relevant context, loss increases.

arxiv.org

https://arxiv.org/pdf/2305.01610

Sparse Probing

Backlinks

Recommendations