SAE Feature Visualization

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 15 21:20
Editor
Edited
Edited
2025 Feb 24 21:34
Refs
Refs
 
 
 
 

UMAP

Browsing code error feature

Layer wise visualized analysis

  • SAE's reconstruction performance degrades sharply when exceeding the training context length
  • In short contexts, performance worsens in later layers, while in long contexts, early layers show degraded performance
  • While most SAE feature steering negatively impacts model performance, some features lead to improvements
  • Errors in early layer SAEs negatively affect the performance of later layers
 
 
 

 

Recommendations