Gradient SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 8 21:3
Editor
Edited
Edited
2025 Jan 8 21:8
Refs
Refs

g-SAEs (Gradient-aware Sparse Autoencoders)

Existing SAEs are likely to overlook latent variable elements that have a significant impact on output since they learn only based on input activation values. In other words, they aim to better reflect the dual role of latents in both the model's representation and behavioral aspects.
By selecting latent variables that are sensitive to changes in output loss, it has fewer activations (
SAE Dead Neuron
) than conventional SAEs and uses the latent space more efficiently.
 
 
 
 
 
 

Recommendations