Stochastic Parameter Decomposition

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 28 22:19
Editor
Edited
Edited
2025 Oct 1 23:4

SPD, APD++

notion image
  • Rank-1 subcomponents decompose each layer's weights, representing the entire matrix as multiple low-dimensional components
  • Probabilistic masking randomly removes "unnecessary" components for each input while minimizing reconstruction loss
  • A small MLP is trained to predict the "causal importance" of each subcomponent, encouraging as many components as possible to be deactivated
  • With less hyperparameter tuning than APD, it accurately recovers original mechanisms in various toy models (superposition, distributed representations, compressed computation, etc.)
 
 
 
 
 
Traditional interpretability focused on neuron activation-space has limitations. The new approach interprets the parameter-space itself by decomposing each weight matrix into a sum of multiple rank-1 matrices (outer products), defined as subcomponents. This is because rank-1 is the minimal computation unit that "detects a specific direction in the input → records a signal in a specific direction in the output". The method assumes that only a few subcomponents are activated per input, and confirms that ablating (removing) unnecessary subcomponents doesn't affect the output.
  • Weight faithfulness: The sum of decomposed components must match the original weights.
  • Stochastic reconstruction: Output is maintained even when randomly removing unnecessary subcomponents for each input.
  • Minimality: Encourages computation using as few subcomponents as possible.
While
Activation Engineering
applies to dynamic situations, parameter decomposition enables permanent deletion/editing of specific knowledge.
 
 
 

Recommendations