The architecture has separate for each layer while sharing only the latent dictionary for scaling
Acausal crosscoder
Crosscoders
CrossCoder (2024)
with Cross fine-tuning model & scaling transferability by diffing within same architecture
BatchTopK crosscoder to prevent Complete Shrinkage and Latent Decoupling for chat model