SAE Weight Regularizer

Creator

Creator

Created

Created

2025 Jan 27 13:29

Editor

Editor

Edited

Edited

2025 Mar 8 12:29

Refs

Refs

sparse weight is good for sparse representation and overall reconstruction

SAE regularization produces more interpretable models — LessWrong

Sparse Autoencoders (SAEs) are useful for providing insight into how a model processes and represents information. A key goal is to represent languag…

SAE regularization produces more interpretable models — LessWrong

https://www.lesswrong.com/posts/sYFNGRdDQYQrSJAd8/sae-regularization-produces-more-interpretable-models

SAE regularization produces more interpretable models — LessWrong

Sparse Crosscoders for Cross-Layer Features and Model Diffing

This note will cover some theoretical examples motivating crosscoders, and then present preliminary experiments applying them to cross-layer superposition and model diffing. We also briefly discuss the theory of how crosscoders might simplify circuit analysis, but leave results on this for a future update.

Sparse Crosscoders for Cross-Layer Features and Model Diffing

https://transformer-circuits.pub/2024/crosscoders/index.html

Recommendations

////////////