Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/
MMCS
Search

MMCS

Creator
Creator
Seonglae Cho
Created
Created
2025 Apr 14 21:2
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 14 21:4
Refs
Refs

Mean max cosine similarity

 
 
 
 
[Interim research report] Taking features out of superposition with sparse autoencoders — AI Alignment Forum
We're thankful for helpful comments from Trenton Bricken, Eric Winsor, Noa Nabeshima, and Sid Black.  …
[Interim research report] Taking features out of superposition with sparse autoencoders — AI Alignment Forum
https://www.alignmentforum.org/posts/z6QQJbtpkEAX3Aojj/interim-research-report-taking-features-out-of-superposition
[Interim research report] Taking features out of superposition with sparse autoencoders — AI Alignment Forum

ML2R (mean L2 Ratio)

arxiv.org
https://arxiv.org/pdf/2501.14926
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/
MMCS
Copyright Seonglae Cho