SAE Training

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 21 13:28
Editor
Edited
Edited
2025 Mar 8 12:29

Reconstruction

Loss

Sometimes, decoder weight sparsity is also included in the loss to include decoder part of sparsity.

Hyperparameters

  • expansion_factor
Sometimes subtracting bias of decoder to center the input activation to learn meaningful features
SAE Training Techniques
 
 
SAE Training Factors
 
 
 
 

Training SAE with L2 loss with several hyperparameters

Techniques
Specifically, we use a layer 5/6 of the way into the network for GPT-4 series models, and we use layer 8 ( 3/4 of the way) for GPT-2 small. We use a context length of 64 tokens for all experiments.
 
 
 

Recommendations