Metrics
- Feature Absorption (lower is better)
- Spurious Correlation Removal (SCR): higher is better
- Targeted Probe Perturbation (TPP)
- Automated Interpretability
- Sparse Probing
- Reconstruction Error (L2 Loss)
Insights
- Selective SAE (Top-k), Gated SAE (JumpReLU) perform better than the regular ReLU SAE but often result in higher Feature Absorption.
- A small dictionary size improves interpretability, while a large dictionary size enhances reconstruction error.
- Low sparsity is suitable for interpretability, whereas high sparsity is more effective for TPP.
- TopK SAE has Sample efficiency, but longer training times may increase Feature Absorption.
Overall, If interpretability is important, the TopK SAE is a good choice with low sparsity. If you need to capture high-level context for complex tasks, it might be better to use the JumpReLU SAE with a wide dictionary size and high sparsity.
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders - Dec 2024
Adam Karvonen*, Can Rager*, Johnny Lin*, Curt Tigges*, Joseph Bloom*,
David Chanin, Yeu-Tong Lau, Eoin Farrell, Arthur Conmy, Callum McDougall, Kola Ayonrinde, Matthew Wearden,
Samuel Marks, Neel Nanda
*equal contribution
https://www.neuronpedia.org/sae-bench/info
Explorer
Results
adamkarvonen/new_sae_bench_results at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/datasets/adamkarvonen/new_sae_bench_results/tree/main/core_with_feature_statistics

Seonglae Cho