L3D

Local Loss Landscape Decomposition

L3D identifies a set of low-rank subnetworks (directions in parameter space of which a subset can reconstruct the gradient of the loss between any sample’s output and a reference output vector). In parameter space, it finds 'subnetwork (circuit)' directions based on the loss function by reconstructing loss gradients between samples and random reference samples as linear combinations of low-dimensional directions (subnetworks). When parameters are manipulated along the reconstructed directions, specific functionalities can be selectively turned on and off like

Steering Vector

The model's parameter space has degrees of freedom (subnetworks) that remain "performance-invariant under multiple directional changes" not only for the entire dataset but also for specific data subsets. This means there are unique "circuits (directions)" hidden for both global and local distributions. These directions in parameter space will be called "subnetworks" from now on - they are units of low-dimensional parameter circuits that activate per sample.

To find these 'local-specific circuits', we decompose the loss gradient between each sample output and random reference outputs. This gradient effectively reveals "parameter directions that only affect this sample". To avoid confusion with the 'loss' mentioned in training, we calculate divergence (kl or mse) based on this gradient.

Unlike

APD, which trains components to add up to the original model weights, L3D trains a small set of components to match the gradients for each input. These components can have overlapping effects across different inputs.

arxiv.org

https://arxiv.org/pdf/2504.00194v1

L3D

Local Loss Landscape Decomposition

Recommendations