Weight initialization
Uniform distribution with transpose matrix between encoder and decoder. If the SAE is transcoder, the encoder matrix is also initialized from uniform distribution
Bias initialization
geometric median for bias
Pass a portion of the data through the model to measure the pre-activation value distribution of each feature. When each feature has a certain bias, use a threshold to adjust for the feature to be activated approximately times across the entire dataset where m is dictionary size and l is the activated count of pre-activation.