Superposition Hypothesis

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 2 4:45
Editor
Edited
Edited
2024 Jul 25 2:48
The superposition hypothesis postulates that neural networks “want to represent more features than they have neurons”.
It is consistent with the fact that actual brain activation does not occur all at once, in that only some of the model's neurons are used to prevent overlapping between superposed functions. The idea that we only use 10% of our brain's capacity came about because there was no understanding of superposition, but in reality, using it that way would render it meaningless.
notion image
The feature activations should be sparse, because sparsity is what enables this kind of noisy simulation. (
Compressed sensing
)
It trained with 2 features and 1 activation relu, they were divided antipodally. As expected, sparsity is necessary for superposition to occur
notion image
notion image
The implementation of the connection to something that should come to mind along with the corresponding memory can be interpreted as a form of superposition. Additionally, the need for a backup neuron due to
Dropout
is also one interpretation.

Architectural approach

There are highlighted several different approaches to solving superposition. One of those approaches was to engineer models to simply not have superposition in the first place.
notion image
It is generally accepted that in 10 dimensions, you can use 10 orthogonal bases, but in fact, as shown above, you can use 5 bases in 2 dimensions. Even with some interference.
But it isn't always the case that features correspond so cleanly to neurons, especially in large language models where it actually seems rare for neurons to correspond to clean features.
In this paper, Anthropic uses toy models — small ReLU networks trained on synthetic data with sparse input features. When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.
  • Superposition is a real, observed phenomenon.
  • Both monosemantic and polysemantic neurons can form.
  • At least some kinds of computation can be performed in superposition.
  • Whether features are stored in superposition is governed by a phase change.
  • Superposition organizes features into geometric structures

Phase change

notion image
If we make it sufficiently sparse, there's a phase change, and it collapses from a pentagon to a pair of digons with the sparser point at zero. The phase change corresponds to loss curves corresponding to the two different geometries crossing over
A more complicated form of non-uniform superposition occurs when there are correlations between features. This seems essential for understanding superposition in the real world, where many features are correlated or anti-correlated.
notion image

Phase change for
In-context learning

Induction heads may be the mechanistic source of general in-context learning in transformer models of any size
Phase change occurs early in training for language models of every size (provided they have more than one layer), and which is visible as a bump in the training loss. During this phase change, the majority of in-context learning ability (as measured by difference in loss between tokens early and late in the sequence) is acquired, and simultaneously induction heads form within the model that are capable of implementing fairly abstract and fuzzy versions of pattern completion.

AI Feature Dimensionality

Is there a way we could understand what "fraction of a dimension" a specific feature gets?
notion image
Perhaps the most striking phenomenon the Anthropic have noticed is that the learning dynamics of toy models with large numbers of features appear to be dominated by "energy level jumps" where features jump between different feature dimensionalities.
notion image
 
 
 
 
2018
2020
2021
2022
 
 
 

Recommendations