The reconstructed activation vector causes much larger errors in next token prediction compared to random vectors at the same distance from the original vector. In other words, the reconstructed vector has a systematic and abnormal negative impact on model performance, making it distinct from simple noise or random errors
SAE reconstruction errors are (empirically) pathological — LessWrong
Summary Sparse Autoencoder (SAE) errors are empirically pathological: when a reconstructed activation vector is distance ϵ from the original activati…
https://www.lesswrong.com/posts/rZPiuFxESMxCDHe4B/sae-reconstruction-errors-are-empirically-pathological

Seonglae Cho