SAEs routinely underestimate the intensity of a given feature. It happens because of the sparsity penalty during training. An SAE will underestimate a feature’s intensity because it wants to account for other features that will interfere
Pathological error