Some features may contain only binary information, while others may require higher precision information
Overcomplete basis of SAEs result in multiple ways of interpretability (problem statement)
reconsidering MDL
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs — LessWrong
We present an information-theoretic framework for interpreting SAEs as lossy compression algorithms for communicating explanations of neural activations. We appeal to the Minimal Description Length (MDL) principle to motivate explanations of activations which are both accurate and concise.
https://www.lesswrong.com/posts/G2oyFQFTE5eGEas6m/interpretability-as-compression-reconsidering-sae

Seonglae Cho