SAE weight initialization

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 12 22:41
Editor
Edited
Edited
2025 Apr 6 19:5
Refs
Refs
 
 
 

Weight initialization

Uniform distribution with transpose matrix between encoder and decoder. If the SAE is transcoder, the encoder matrix is also initialized from uniform distribution

Bias initialization

geometric median for bias
Pass a portion of the data through the model to measure the pre-activation value distribution of each feature. When each feature has a certain bias, use a threshold to adjust for the feature to be activated approximately times across the entire dataset where m is dictionary size and l is the activated count of pre-activation.
 
 

Recommendations