SAE Scaling

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 15 20:44
Editor
Edited
Edited
2025 Mar 8 12:29
Refs
Refs

Scaling Law for SAEs

  • Loss decreases approximately according to a power law with computation
  • As computational resources increase, the optimal FLOPS allocation for training steps and number of features increases approximately according to a power law
  • At tested compute budgets, the optimal number of features tends to increase faster than the optimal number of training steps
 
 
 
 
 
 
The extent to which using additional compute improves dictionary learning results. In an SAE, compute usage primarily depends on two key hyperparameters, the number of features being learned, and the number of steps used to train the autoencoder.
 

Recommendations