Attribution-based Parameter Decomposition
Minimizing Mechanistic Description Length
To substitute traditional Matrix Decomposition
Superposition Hypothesis
- Right singular vectors align with the activation directions that lead the parameter components to have downstream causal effects (update to align output)
- Left singular vectors are the directions in which activations have downstream causal effects and align gradients that activate that component (update to align input)
- Parameter components can be decomposed as an outer product of their (un-normed) left and right singular vectors