The reconstructed activation vector causes much larger errors in next token prediction compared to random vectors at the same distance from the original vector. In other words, the reconstructed vector has a systematic and abnormal negative impact on model performance, making it distinct from simple noise or random errors
SAE pathological error
Creator
Creator
Seonglae ChoCreated
Created
2024 Nov 19 22:34Editor
Editor
Seonglae ChoEdited
Edited
2025 Jan 8 20:49Refs
Refs