Meta SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 28 13:49
Editor
Edited
Edited
2025 Mar 8 12:28
Refs

Meta SAE features on SAE decoder

MetaSAEs are sparse autoencoders (SAEs) trained on the decoder directions (y-axis) of another SAE.
x^=bdec+1Ffi(x)W,idec\hat{x} = b^{dec} + \sum_1^{F}f_i(x)W_{\cdot,i}^{dec}f(x)=1×dictionary_sizef(x) = 1 \times dictionary\_size Wdec=dictionary_size×dimension_sizeW_{dec} = dictionary\_size\times dimension\_size fi(x)=1×1f_i(x) = 1 \times 1 W,idec=1×dimension_sizeW_{\cdot,i}^{dec} = 1\times dimension\_size
notion image

Meta Latents

https://arxiv.org/pdf/2502.04878
 
 
 

MetaSAE

weight
meta feature explorer
 
 

Recommendations