GoodFire Ember Paint

Creator
Creator
Seonglae Cho
Created
Created
2025 Jun 1 11:14
Editor
Edited
Edited
2025 Jun 2 10:19

SDXL
sdxl-turbo-interpretability
goodfire-aiUpdated 2025 Jul 2 20:15

Combined SAE and
NMF
to transform the model's internal representations into human-understandable units, making the (black box) diffusion model transparently manipulatable. Hundreds of SAE features were grouped using NMF into several high-level units (factors), combining the
SAE Feature Splitting
through NMF. In the equation V=WHV=WH, where V is the original SAE activation strength matrix, each row of H represents a high-level factor, and the values in that row represent the weights of the corresponding SAE features.
notion image

NMF

notion image
 
 
 
demo
blog
umap
skull
breast
Lips
model sae
 
 

Recommendations