SAE Decoder Loss

Creator
Creator
Seonglae Cho
Created
Created
2025 Mar 13 17:23
Editor
Edited
Edited
2025 May 20 18:46

SAE Feature Direction Loss

 
 
The
SAE Feature Absorption
and co-occurrence problems cause the model to learn "broken latents". While tied SAEs have cleaner representations due to identical encoder and decoder weights, issues still arise when there are insufficient latents for concepts like parent-child relationships.
To mitigate this mixing phenomenon, an auxiliary loss function (squared cosine similarity between inputs and feature directions at low activation states) is introduced to encourage single peaks in activation strength.

Tanh loss

Achieves Pareto-optimality by "minimizing feature activations while maintaining low output error"
 
 
 

Recommendations