SAE Feature Structure

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 8 20:25
Editor
Edited
Edited
2025 Mar 6 0:8
SAE latent vectors are not independent, but rather form clusters that activate together in predictable ways. While functionally separate, there are actual dependencies, making interactions and compositional characteristics important for interpretability. This is particularly evident in smaller SAEs, and these clusters can be effectively analyzed through L0 regularization.
 
 
 
 
 
Compositionality and Ambiguity:  Latent Co-occurrence and Interpretable Subspaces — LessWrong
Matthew A. Clarke, Hardik Bhatnagar and Joseph Bloom
Compositionality and Ambiguity:  Latent Co-occurrence and Interpretable Subspaces — LessWrong
When two features frequently activate at the same time, we say they co-occur (high correlation)
Feature Cooccurrence Explorer
This app was built in Streamlit! Check it out and visit https://streamlit.io for more awesome community apps. 🎈
Feature Cooccurrence Explorer

Topological Data Analysis

Graph Modeling of SAE features displayed Relationship relevant features developing along the layers and latter layers involves more complex features.
Topological Data Analysis and Mechanistic Interpretability — LessWrong
This article was written in response to a post on LessWrong from the Apollo Research interpretability team. This post represents our initial attempts…
Topological Data Analysis and Mechanistic Interpretability — LessWrong
 
 

Recommendations