Superposition Hypothesis Negative Evidence

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Feb 10 18:51
Editor
Edited
Edited
2026 Feb 10 18:51
Refs
Refs

Contradict 2024

Traditional interpretations assume it works through linear combinations, but in reality, the positions of each feature vector play a crucial role. Days of the week or months are arranged in a circular pattern, and position embeddings form a helix. While superposition itself is certain, whether it operates as a linear combination remains unclear.
The core argument is that how feature vectors are positioned relative to each other in terms of distance, direction, or patterns provides additional meaning in how the model processes information, but the counterargument suggests that "position" - the structural information that results from selecting a specific basis - is not essential.
SAE feature geometry is outside the superposition hypothesis — LessWrong
Written at Apollo Research • Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations…
SAE feature geometry is outside the superposition hypothesis — LessWrong
 
 
 
 
 
 
 
 
 

Recommendations