Pre-bias gradient

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 3 12:58
Editor
Edited
Edited
2025 Feb 3 13:9
Refs
Refs
The pre-bias gradient uses a trick of summing pre-activation gradient across the batch dimension before multiplying with the encoder weights. In other words, we constrain the pre-encoder bias to equal the negative of the post-decoder bias and initialize it to the geometric median of the dataset.
Each training step of a sparse autoencoder generally consists of six major computations
  1. the encoder forward pass
  1. the encoder gradient
  1. the decoder forward pass
  1. the decoder gradient
  1. the latent gradient
  1. the pre-bias gradient
 
 
 
 
 

Geometric median initialization

insight of Pre-bias
 
 

Recommendations