Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
Switch SAE
Search

Switch SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 21 13:21
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Mar 17 12:56
Refs
Refs
MoE
for scaling up to very high width aimed at reducing compute cost of training
notion image
 
 
 
 
Efficient Dictionary Learning with Switch Sparse Autoencoders
Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in...
Efficient Dictionary Learning with Switch Sparse Autoencoders
https://arxiv.org/abs/2410.08201
Efficient Dictionary Learning with Switch Sparse Autoencoders
Efficient Dictionary Learning with Switch Sparse Autoencoders — LessWrong
Produced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort …
Efficient Dictionary Learning with Switch Sparse Autoencoders — LessWrong
https://www.lesswrong.com/posts/47CYFbrSyiJE2X5ot/efficient-dictionary-learning-with-switch-sparse
Efficient Dictionary Learning with Switch Sparse Autoencoders — LessWrong
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
Switch SAE
Copyright Seonglae Cho