sparse weight is good for sparse representation and overall reconstruction SAE regularization produces more interpretable models — LessWrongSparse Autoencoders (SAEs) are useful for providing insight into how a model processes and represents information. A key goal is to represent languag…https://www.lesswrong.com/posts/sYFNGRdDQYQrSJAd8/sae-regularization-produces-more-interpretable-modelsSparse Crosscoders for Cross-Layer Features and Model DiffingThis note will cover some theoretical examples motivating crosscoders, and then present preliminary experiments applying them to cross-layer superposition and model diffing. We also briefly discuss the theory of how crosscoders might simplify circuit analysis, but leave results on this for a future update.https://transformer-circuits.pub/2024/crosscoders/index.html