CLT

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Oct 30 0:5
Editor
Edited
Edited
2025 Oct 30 0:6
Refs
Refs

CLT (cross-layer
Transcoder
)

notion image
Unlike
Crosscoder
which only shares latent dimensions, CLT shares encoders and trains different decoders for subsequent layers. While PLT (Per Layer Transcoder) was trained to mimic each layer's MLP input-to-output function to learn causality, CLT scaled this approach with n encoder-decoder pairs. CLT maintains the same encoder for each layer, but has decoders for all subsequent causal MLP outputs (including itself), capturing much more diverse cross-layer causality. Through this, with + encoders trained, it creates a half fully-connected graph, then combines various pruning techniques with correlation-based importance scores and metrics like TWERA and ERA to obtain the final
Attribution Graph
. Causality is then verified through patching, though the graph construction itself is not based on patching-derived causality.
notion image

Decoder sparsity loss

Building on
Crosscoder
's decoder sparsity loss, CLT uses tanh activation for decoders to achieve appropriate regularization: behaving linearly near 0 and saturating to 1 for larger values, which stabilizes training.
 
 
 
 
 
 
 

Recommendations