Meta SAE features on SAE decoder
MetaSAEs are sparse autoencoders (SAEs) trained on the decoder directions (y-axis) of another SAE.
x^=bdec+∑1Ffi(x)W⋅,idecf(x)=1×dictionary_sizeWdec=dictionary_size×dimension_sizefi(x)=1×1W⋅,idec=1×dimension_sizeMeta Latents
MetaSAE
weight
meta feature explorer