Tied SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 25 12:14
Editor
Edited
Edited
2025 Mar 13 20:4
Refs
Refs
cosine similarity loss between feature direction of encoder and decoder matrix
 
 
 
 
 

Untied SAE

The
SAE Feature Absorption
and co-occurrence problems cause the model to learn "broken latents". While tied SAEs have cleaner representations due to identical encoder and decoder weights, issues still arise when there are insufficient latents for concepts like parent-child relationships.
To mitigate this mixing phenomenon, an auxiliary loss function (squared cosine similarity between inputs and feature directions at low activation states) is introduced to encourage single peaks in activation strength.
 
 

Recommendations