SAE Training Duality

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Mar 10 16:5
Editor
Edited
Edited
2025 Mar 10 16:7
Refs
Refs
SAE aims to solve the
Bilevel optimization
problem where the outer optimization minimizes reconstruction error and sparsity regularization, while the inner optimization finds optimal projection values for the encoder given a constraint set.

Duality

Fundamental duality between how concepts are organized in model representations versus how an SAE encoder’s receptive fields should be structured to optimally identify said concepts. Crucially, this implies any SAE is implicitly biased towards identifying concepts that are organized in a specific manner.
 
 
 
 
arxiv.org
The Rate Distortion Dance between reconstruction (
Compressed sensing
) and sparsity (
Interpretable Sparse Coding
)
The Rate Distortion Dance of Sparse Autoencoders | Tilde
Overview: in this blog post, we are going to be setting some of the theoretical foundations and intuition for the problems we think about. Over the coming week, we will release different blog posts focused on specific experiments and empirical questions. As such, this post aims to lay the groundwork for what's to come. We're excited to share the tip of the iceberg!
The Rate Distortion Dance of Sparse Autoencoders | Tilde
 
 

Recommendations