Transcoder

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Apr 6 17:39
Editor
Edited
Edited
2026 Feb 5 16:18

A neural network component that projects input to a sparse dimensional space and reconstructs the output

Per-layer transcoder
Transcoders
 
 
 

Transcoder

arxiv.org
Transcoders enable fine-grained interpretable circuit analysis for language models — AI Alignment Forum
Summary * We present a method for performing circuit analysis on language models using "transcoders," an occasionally-discussed variant of SAEs tha…
Transcoders enable fine-grained interpretable circuit analysis for language models — AI Alignment Forum

Transcoders Beat Sparse Autoencoders for Interpretability

  • Narrower interpretation distribution and stronger monosemantic (single-meaning feature activation) characteristics.
  • Sparse Probing performance similar to or slightly better than SAE.
Skip Transcoder
can replace SAE for Residual Stream (when Identity skip is added).
arxiv.org
 
 

Recommendations