Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Training/
SAE Weight Regularizer
Search

SAE Weight Regularizer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 27 13:29
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Mar 8 12:29
Refs
Refs
sparse weight is good for sparse representation and overall reconstruction
 
 
 
 
 
SAE regularization produces more interpretable models — LessWrong
Sparse Autoencoders (SAEs) are useful for providing insight into how a model processes and represents information. A key goal is to represent languag…
SAE regularization produces more interpretable models — LessWrong
https://www.lesswrong.com/posts/sYFNGRdDQYQrSJAd8/sae-regularization-produces-more-interpretable-models
SAE regularization produces more interpretable models — LessWrong
Sparse Crosscoders for Cross-Layer Features and Model Diffing
This note will cover some theoretical examples motivating crosscoders, and then present preliminary experiments applying them to cross-layer superposition and model diffing. We also briefly discuss the theory of how crosscoders might simplify circuit analysis, but leave results on this for a future update.
Sparse Crosscoders for Cross-Layer Features and Model Diffing
https://transformer-circuits.pub/2024/crosscoders/index.html
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Training/
SAE Weight Regularizer
Copyright Seonglae Cho