Gradient Routing

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Dec 18 15:44
Editor
Edited
Edited
2025 Nov 4 2:22

Gradient Masking during Backpropagation to Limit the Effect of Data Points on Specific Model Subcomponents

This approach leads each subcomponent to specialize in limited features, making it easier to remove potentially harmful features before public deployment. While this method still suffers from performance degradation (Alignment Tax) when a subcomponent is detached, it enables fundamental control over the model’s internal architecture and features, even for public use.
It even enables the latent dimensions to be interpretable in terms of
Monosemanticity
, which is usually achieved through
Sparse Autoencoder
.
https://www.lesswrong.com/posts/8zDjhJNoFhMuHB5Kc/creating-interpretable-latent-spaces-with-gradient-routing
This approach demonstrates the possibility of developing separate
Brain Lobe
through selective gradient propagations in future modeling, similar to how the left and right brain function.

Steering Scalar

 
 
arxiv.org
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks — LessWrong
Isolate capabilities to known parts of a neural network. Helps with interpretability, robust unlearning, and scalable oversight.
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks — LessWrong
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Isolate capabilities to known parts of a network. Helps with interpretability, robust unlearning, and scalable oversight.
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
 
 

Recommendations