SAE Loss

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Mar 7 12:28
Editor
Edited
Edited
2025 Aug 2 1:31
Refs
Refs
normalized version of all MSE numbers, where we divide by a baseline reconstruction error of always predicting the mean activations
  • L0
  • L1
  • L2
  • KL
While L0 and L1 allow control of sparsity at the sample level,
SAE High Frequency Latent
features emerge because there's no pressure applied to the overall
SAE Feature Distribution
across all samples, resulting in many latents that are sparsely but frequently activated (5% of features activate more than 50% of the time). Therefore, if we develop approaches like
BatchTopK SAE
, we could provide incentives for different samples to use different features or penalize overlapping feature usage. Through independency loss or
Contrastive Learning
, we can achieve globally applicable sparsity.
 
 
 

Reconstruction dark matter within
Dictionary Learning

A significant portion of SAE reconstruction error can be linearly predicted, but there exists a nonlinear error that does not decrease even when increasing the size. Therefore, additional techniques are needed to reduce nonlinear error.
While end-to-end training with KL divergence requires more computational resources, using KL divergence just for fine-tuning proves to be effective.
notion image

L0 is not neutral

 
 

Recommendations