SAE pathological error

The reconstructed activation vector causes much larger errors in next token prediction compared to random vectors at the same distance from the original vector. In other words, the reconstructed vector has a systematic and abnormal negative impact on model performance, making it distinct from simple noise or random errors

SAE reconstruction errors are (empirically) pathological — LessWrong

Summary Sparse Autoencoder (SAE) errors are empirically pathological: when a reconstructed activation vector is distance ϵ from the original activati…

https://www.lesswrong.com/posts/rZPiuFxESMxCDHe4B/sae-reconstruction-errors-are-empirically-pathological

SAE pathological error

Recommendations