A neural network component that projects input to a sparse dimensional space and reconstructs the output
Per-layer transcoder
Transcoders
Transcoder
Transcoders enable fine-grained interpretable circuit analysis for language models — AI Alignment Forum
Summary * We present a method for performing circuit analysis on language models using "transcoders," an occasionally-discussed variant of SAEs tha…
https://www.alignmentforum.org/posts/YmkjnWtZGLbHRbzrP/transcoders-enable-fine-grained-interpretable-circuit
Transcoders Beat Sparse Autoencoders for Interpretability
- Narrower interpretation distribution and stronger monosemantic (single-meaning feature activation) characteristics.
- Sparse Probing performance similar to or slightly better than SAE.
Skip Transcoder can replace SAE for Residual Stream (when Identity skip is added).

Seonglae Cho