Meta SAE

Creator

Creator

Seonglae Cho

Created

Created

2025 Jan 28 13:49

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Mar 8 12:28

Refs

Refs

Sparse Autoencoder

Meta SAE features on SAE decoder

MetaSAEs are sparse autoencoders (SAEs) trained on the decoder directions (y-axis) of another SAE.

notion image

Meta Latents

https://arxiv.org/pdf/2502.04878

MetaSAE

https://openreview.net/pdf?id=9ca9eHNrdH

Showing SAE Latents Are Not Atomic Using Meta-SAEs — LessWrong

Bart, Michael and Patrick are joint first authors. Research conducted as part of MATS 6.0 in Lee Sharkey and Neel Nanda’s streams. Thanks to Mckenna…

Showing SAE Latents Are Not Atomic Using Meta-SAEs — LessWrong

https://www.lesswrong.com/posts/TMAmHh4DdMr4nCSr5/

Showing SAE Latents Are Not Atomic Using Meta-SAEs — LessWrong

weight

pleask/gpt2-small_blocks.8.hook_resid_pre_2304_topk_4_0.001_2000 at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/pleask/gpt2-small_blocks.8.hook_resid_pre_2304_topk_4_0.001_2000/tree/main

pleask/gpt2-small_blocks.8.hook_resid_pre_2304_topk_4_0.001_2000 at main

meta feature explorer

public dashboard

This app was built in Streamlit! Check it out and visit https://streamlit.io for more awesome community apps. 🎈

https://metasae.streamlit.app/?page=Meta+Feature+Explorer&meta_feature=228

public dashboard

Recommendations

//////////