SAE Feature Direction

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 17 1:46
Editor
Edited
Edited
2025 Feb 21 22:52

row of SAE decoder matrix

 
 
 
 
 

Depends on seed and dataset (
SAE Training
)

Weight Cosine Similarity
Orphan features still shows high interpretability which indicates the different seed may have found the subset of the “idealized dictionary size”.
  • dataset - matters more than seed
 
 
 

Recommendations