Loading views...

최종 lesswrong first post

Date
Date
2025 Feb 25 0:0
Created by
Created by
Seonglae ChoSeonglae Cho
Created time
Created time
2025 Feb 25 13:5
Last edited by
Last edited by
Seonglae ChoSeonglae Cho
Last edited time
Last edited time
2025 Feb 26 14:36
Refs
Refs
The decoder weight is directly affected by L2 reconstruction loss which force decoder weight to utilize features as much as possible However, encoder matrix are pressured by to sparsity L1 loss of feature vector which prevents the weight to represent features enough so it “shrinks” the representation ability as above. The same explanation can be applied to the neuron similarity for each weight matrix. Since encoder more focuses on neuron than sparse feature, it has repesent more about the neuron. Same for the decoder focuses more about the feature, relatively less attention to the neuron itself Overall less cosine similarity lies on two reasons. First the dimension of weight vector of neuron is dictionary size which cause curse of dimensionality to reduce a possibility of same direction. Also, a lot of orphan feature separated by SAEs cause mismatch between vectors
 
Universality Hypothesis(Chughtai et al., 2023; Bricken et al., 2023).
 
 

Recommendations