Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Activation Decomposition/Sparse Autoencoder/
SAE Benchmark
Search

SAE Benchmark

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Dec 18 16:9
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Dec 22 17:38
Refs
Refs
Topic model
  • require fewer total features
  • less reconstruction loss
  • sparsity (require fewer simultaneously active features)
SAE Benchmarks
SAEBench
SAE Steerability
Feature Monosemanticity Score
 
 
 
 
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Activation Decomposition/Sparse Autoencoder/
SAE Benchmark
Copyright Seonglae Cho