Scaling Law for SAEs
- Loss decreases approximately according to a power law with computation
- As computational resources increase, the optimal FLOPS allocation for training steps and number of features increases approximately according to a power law
- At tested compute budgets, the optimal number of features tends to increase faster than the optimal number of training steps
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
We find a diversity of highly abstract features. They both respond to and behaviorally cause abstract behaviors. Examples of features we find include features for famous people, features for countries and cities, and features tracking type signatures in code. Many features are multilingual (responding to the same concept across languages) and multimodal (responding to the same concept in both text and images), as well as encompassing both abstract and concrete instantiations of the same idea (such as code with security vulnerabilities, and abstract discussion of security vulnerabilities).
https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html#assessing-tour-influence/
The extent to which using additional compute improves dictionary learning results. In an SAE, compute usage primarily depends on two key hyperparameters, the number of features being learned, and the number of steps used to train the autoencoder.

Seonglae Cho