Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
Deep SAE
Search

Deep SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 25 18:36
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Feb 25 18:36
Refs
Refs
 
 
 
 
 
 
Deep sparse autoencoders yield interpretable features too — LessWrong
I sandwich the sparse layer in a sparse autoencoder (SAE) between non-sparse lower-dimensional layers and refer to this as a deep SAE.
Deep sparse autoencoders yield interpretable features too — LessWrong
https://www.lesswrong.com/posts/tLCBJn3NcSNzi5xng/deep-sparse-autoencoders-yield-interpretable-features-too#SAE_depth_improves_the_reconstruction_sparsity_frontier
Deep sparse autoencoders yield interpretable features too — LessWrong
arxiv.org
https://arxiv.org/pdf/2411.13117
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
Deep SAE
Copyright Seonglae Cho