Stochastic Parameter Decomposition

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 28 22:19
Editor
Edited
Edited
2025 Jul 21 21:53

SPD, APD++

notion image
  • Rank-1 subcomponents decompose each layer's weights, representing the entire matrix as multiple low-dimensional components
  • Probabilistic masking randomly removes "unnecessary" components for each input while minimizing reconstruction loss
  • A small MLP is trained to predict the "causal importance" of each subcomponent, encouraging as many components as possible to be deactivated
  • With less hyperparameter tuning than APD, it accurately recovers original mechanisms in various toy models (superposition, distributed representations, compressed computation, etc.)
 
 
 
 
 
 

Recommendations