Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
ReLU SAE
Search

ReLU SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Mar 10 13:48
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Mar 27 22:18
Refs
Refs
ReLU
basic
 
 
 
 
[Interim research report] Taking features out of superposition with sparse autoencoders — LessWrong
We're thankful for helpful comments from Trenton Bricken, Eric Winsor, Noa Nabeshima, and Sid Black.  …
[Interim research report] Taking features out of superposition with sparse autoencoders — LessWrong
https://www.lesswrong.com/posts/z6QQJbtpkEAX3Aojj/interim-research-report-taking-features-out-of-superposition
[Interim research report] Taking features out of superposition with sparse autoencoders — LessWrong
arxiv.org
https://arxiv.org/pdf/2309.08600
2023
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Using a sparse autoencoder, we extract a large number of interpretable features from a one-layer transformer.Browse A/1 Features →Browse All Features →
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
https://transformer-circuits.pub/2023/monosemantic-features#setup-autoencoder-motivation
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
ReLU SAE
Copyright Seonglae Cho