sparse weight is good for sparse representation and overall reconstruction
SAE regularization produces more interpretable models — LessWrong
Sparse Autoencoders (SAEs) are useful for providing insight into how a model processes and represents information. A key goal is to represent languag…
https://www.lesswrong.com/posts/sYFNGRdDQYQrSJAd8/sae-regularization-produces-more-interpretable-models
Sparse Crosscoders for Cross-Layer Features and Model Diffing
This note will cover some theoretical examples motivating crosscoders, and then present preliminary experiments applying them to cross-layer superposition and model diffing. We also briefly discuss the theory of how crosscoders might simplify circuit analysis, but leave results on this for a future update.
https://transformer-circuits.pub/2024/crosscoders/index.html

Seonglae Cho