Neuron SAE history

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Oct 24 9:44
Editor
Edited
Edited
2024 Oct 24 9:45
Refs
Refs

2022 Dec

It turns out that an extremely simple method – training a single layer autoencoder to reconstruct neural activations with an L1 penalty on hidden activations – doesn’t just identify features that minimize the loss, but actually recovers the ground truth features that generated the data. However, at least using this method of sparse coding, it’s extremely costly to extract features from superposition (possibly more costly than training the models themselves)
  • The L1 penalty coefficient needs to be just right
  • We need more learned features than ground truth features
2023

2024

Anthropic
OpenAI
OpenAI K-sparse AutoEncoder to directly control sparsity and improving the reconstruction-sparsity frontier (tradeoff) with finding scaling laws.
 
 
 
 
 
 
 
 

Recommendations