Superposition Hypothesis Negative Evidence

Contradict 2024

Traditional interpretations assume it works through linear combinations, but in reality, the positions of each feature vector play a crucial role. Days of the week or months are arranged in a circular pattern, and position embeddings form a helix. While superposition itself is certain, whether it operates as a linear combination remains unclear.

The core argument is that how feature vectors are positioned relative to each other in terms of distance, direction, or patterns provides additional meaning in how the model processes information, but the counterargument suggests that "position" - the structural information that results from selecting a specific basis - is not essential.

SAE feature geometry is outside the superposition hypothesis — LessWrong

Written at Apollo Research • Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations…

https://www.lesswrong.com/posts/MFBTjb2qf3ziWmzz6/sae-feature-geometry-is-outside-the-superposition-hypothesis

Superposition Hypothesis Negative Evidence

Contradict 2024

Recommendations