Local Loss Landscape Decomposition
L3D identifies a set of low-rank subnetworks (directions in parameter space of which a subset can reconstruct the gradient of the loss between any sample’s output and a reference output vector). In parameter space, it finds 'subnetwork (circuit)' directions based on the loss function by reconstructing loss gradients between samples and random reference samples as linear combinations of low-dimensional directions (subnetworks). When parameters are manipulated along the reconstructed directions, specific functionalities can be selectively turned on and off like Steering Vector
The model's parameter space has degrees of freedom (subnetworks) that remain "performance-invariant under multiple directional changes" not only for the entire dataset but also for specific data subsets. This means there are unique "circuits (directions)" hidden for both global and local distributions. These directions in parameter space will be called "subnetworks" from now on - they are units of low-dimensional parameter circuits that activate per sample.
To find these 'local-specific circuits', we decompose the loss gradient between each sample output and random reference outputs. This gradient effectively reveals "parameter directions that only affect this sample". To avoid confusion with the 'loss' mentioned in training, we calculate divergence (kl or mse) based on this gradient.