APD

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 28 14:3
Editor
Edited
Edited
2025 Feb 27 16:4
Refs
Refs

Attribution-based Parameter Decomposition

Minimizing Mechanistic Description Length

To substitute traditional
Matrix Decomposition

notion image

Superposition Hypothesis

  • Right singular vectors align with the activation directions that lead the parameter components to have downstream causal effects (update to align output)
  • Left singular vectors are the directions in which activations have downstream causal effects and align gradients that activate that component (update to align input)
  • Parameter components can be decomposed as an outer product of their (un-normed) left and right singular vectors
 
 
notion image
 
 
 
 
 

Recommendations