CLT

CLT (cross-layer
Transcoder)

Unlike

Crosscoder which only shares latent dimensions, CLT shares encoders and trains different decoders for subsequent layers. While PLT (Per Layer Transcoder) was trained to mimic each layer's MLP input-to-output function to learn causality, CLT scaled this approach with n encoder-decoder pairs. CLT maintains the same encoder for each layer, but has decoders for all subsequent causal MLP outputs (including itself), capturing much more diverse cross-layer causality. Through this, with + encoders trained, it creates a half fully-connected graph, then combines various pruning techniques with correlation-based importance scores and metrics like TWERA and ERA to obtain the final

Attribution Graph. Causality is then verified through patching, though the graph construction itself is not based on patching-derived causality.

Decoder sparsity loss

Building on

Crosscoder's decoder sparsity loss, CLT uses tanh activation for decoders to achieve appropriate regularization: behaving linearly near 0 and saturating to 1 for larger values, which stabilizes training.

Circuit Tracing: Revealing Computational Graphs in Language Models

We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.

https://transformer-circuits.pub/2025/attribution-graphs/methods.html

CLT

CLT (cross-layer Transcoder)

Decoder sparsity loss

Recommendations

CLT (cross-layer
Transcoder)