SPD, APD++

- Rank-1 subcomponents decompose each layer's weights, representing the entire matrix as multiple low-dimensional components
- Probabilistic masking randomly removes "unnecessary" components for each input while minimizing reconstruction loss
- A small MLP is trained to predict the "causal importance" of each subcomponent, encouraging as many components as possible to be deactivated
- With less hyperparameter tuning than APD, it accurately recovers original mechanisms in various toy models (superposition, distributed representations, compressed computation, etc.)
arxiv.org
https://arxiv.org/pdf/2506.20790
Towards Scalable Parameter Decomposition
The most successful methods so far, like SAEs, have focused largely on the activations that flow through a model, rather than directly inspecting the weights that transform and guide the inputs into those flows. It's a bit like trying to understand a program by only looking at its runtime variables, but never its source code. Parameter decomposition offers a way to decompose a model’s parameters—the 'source code'—into components that reveal not only what the network computes, but how it computes it. Today, we're releasing a paper on Stochastic Parameter Decomposition (SPD), which removes key barriers to the scalability of prior methods.
https://www.goodfire.ai/papers/stochastic-param-decomp

Traditional interpretability focused on neuron activation-space has limitations. The new approach interprets the parameter-space itself by decomposing each weight matrix into a sum of multiple rank-1 matrices (outer products), defined as subcomponents. This is because rank-1 is the minimal computation unit that "detects a specific direction in the input → records a signal in a specific direction in the output". The method assumes that only a few subcomponents are activated per input, and confirms that ablating (removing) unnecessary subcomponents doesn't affect the output.
- Weight faithfulness: The sum of decomposed components must match the original weights.
- Stochastic reconstruction: Output is maintained even when randomly removing unnecessary subcomponents for each input.
- Minimality: Encourages computation using as few subcomponents as possible.
While Activation Engineering applies to dynamic situations, parameter decomposition enables permanent deletion/editing of specific knowledge.
Making sense of parameter-space decomposition — LessWrong
As we all know, any sufficiently-advanced technology is indistinguishable from magic. Accordingly, whenever such artifact appears, a crowd of researc…
https://www.lesswrong.com/posts/Wo22C8vhveDbDWhAc/making-sense-of-parameter-space-decomposition

Seonglae Cho