Transcoders Beat Sparse Autoencoders for Interpretability
- Narrower interpretation distribution and stronger monosemantic (single-meaning feature activation) characteristics.
- Sparse Probing performance similar to or slightly better than SAE.
Skip Transcoder can replace SAE for Residual Stream (when Identity skip is added).
arxiv.org
https://arxiv.org/pdf/2501.18823

Seonglae Cho