Crosscoder

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Nov 2 0:14
Editor
Edited
Edited
2025 Oct 29 23:48
The architecture has separate for each layer while sharing only the latent dictionary for scaling, where : source layer-specific encoder, : target layer-specific decoder, : reconstructed layer activation from source latent.
Each input-output layer pair has its own encoder-decoder weight pair. Unlike cross-layer transcoders (
Circuit Tracing
), encoders are not shared; instead, layers share the same latent space, achieved through loss-based approximation.
This is solved through alignment via co-training, where an alignment loss is added during training to force the latents to match, causing them to converge to a shared latent space.
https://transformer-circuits.pub/2024/crosscoders/index.html

Decoder sparsity loss

Crosscoders
 
 
 
 
 
 
 
 

Recommendations