Crosscoder

Creator

Creator

Created

Created

2024 Nov 2 0:14

Editor

Editor

Edited

Edited

2025 May 10 22:59

Refs

Refs

SAE Layer Transferability

The architecture has separate

W_{enc}, W_{dec}

for each layer while sharing only the latent dictionary for scaling

https://transformer-circuits.pub/2024/crosscoders/index.html

Acausal crosscoder

Crosscoders

Gemma CrossCoder

CrossCoder (2024)

with Cross fine-tuning model & scaling transferability by diffing within same architecture

Sparse Crosscoders for Cross-Layer Features and Model Diffing

This note will cover some theoretical examples motivating crosscoders, and then present preliminary experiments applying them to cross-layer superposition and model diffing. We also briefly discuss the theory of how crosscoders might simplify circuit analysis, but leave results on this for a future update.

Sparse Crosscoders for Cross-Layer Features and Model Diffing

https://transformer-circuits.pub/2024/crosscoders/index.html

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing — LessWrong

Intro Anthropic recently released an exciting mini-paper on crosscoders (Lindsey et al.). In this post, we open source a model-diffing crosscoder tra…

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing — LessWrong

https://www.lesswrong.com/posts/srt6JXsRMtmqAJavD/open-source-replication-of-anthropic-s-crosscoder-paper-for

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing — LessWrong

BatchTopK crosscoder to prevent Complete Shrinkage and Latent Decoupling for chat model

Robustly identifying concepts introduced during chat fine-tuning...

Model diffing is the study of how fine-tuning changes a model's representations and internal algorithms. Many behaviours of interest are introduced during fine-tuning, and model diffing offers a...

Robustly identifying concepts introduced during chat fine-tuning...

https://arxiv.org/abs/2504.02922

Robustly identifying concepts introduced during chat fine-tuning...

Recommendations

///////////