Pre-bias gradient

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 3 12:58
Editor
Edited
Edited
2025 Feb 3 13:9
Refs
Refs
The pre-bias gradient uses a trick of summing pre-activation gradient across the batch dimension before multiplying with the encoder weights. In other words, we constrain the pre-encoder bias to equal the negative of the post-decoder bias and initialize it to the geometric median of the dataset.
Each training step of a sparse autoencoder generally consists of six major computations
  1. the encoder forward pass
  1. the encoder gradient
  1. the decoder forward pass
  1. the decoder gradient
  1. the latent gradient
  1. the pre-bias gradient
 
 
 
 
 

Geometric median initialization

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole. By understanding the function of each component, and how they interact, we hope to be able to reason about the behavior of the entire network. The first step in that program is to identify the correct components to analyze.
insight of Pre-bias
More findings on Memorization and double descent — AI Alignment Forum
Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. …
More findings on Memorization and double descent — AI Alignment Forum
 
 

Recommendations