Transcoder

Creator

Creator

Seonglae Cho

Created

Created

2025 Apr 6 17:39

Editor

Editor

Seonglae Cho

Edited

Edited

2026 Feb 5 16:18

Refs

Refs

Causal abstraction

Circuit Discovery

A neural network component that projects input to a sparse dimensional space and reconstructs the output

Per-layer transcoder

Transcoders

Skip Transcoder

Multi-token transcoder

Gemma Transcoder

LLaMa Transcoder

Transcoder

https://arxiv.org/pdf/2406.11944

Transcoders enable fine-grained interpretable circuit analysis for language models — AI Alignment Forum

Summary * We present a method for performing circuit analysis on language models using "transcoders," an occasionally-discussed variant of SAEs tha…

https://www.alignmentforum.org/posts/YmkjnWtZGLbHRbzrP/transcoders-enable-fine-grained-interpretable-circuit

Transcoders enable fine-grained interpretable circuit analysis for language models — AI Alignment Forum

Transcoders Beat Sparse Autoencoders for Interpretability

Narrower interpretation distribution and stronger monosemantic (single-meaning feature activation) characteristics.

Sparse Probing performance similar to or slightly better than SAE.

Skip Transcoder can replace SAE for Residual Stream (when Identity skip is added).

https://arxiv.org/pdf/2501.18823

Recommendations

////////////