- BatchTopK Cross-Coder learns sparse vector representations of latent features between two models
- Latent Scaling technique quantifies whether each feature was strengthened, weakened, or maintained
- LLM-based automatic interpretation and classification assigns meaning to features
Results
openreview.net
https://openreview.net/pdf/e3d7788d298b36a20982e258e22f1574225e3328.pdf

Seonglae Cho