SAE MDL

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Mar 17 11:47
Editor
Edited
Edited
2025 Mar 17 14:36

Minimal Description Length

An explanation of some phenomena is a statement for which knowing gives some information about . An explanation is typically a natural language statement
The Description Length (DL) of some explanation e is given as , where is the metric denoting the number of bits needed to send the explanation through a communication channel.
For SAE, . The first term is and the second term is in the dataset size so the first term dominates in the large regime
We say an SAE is if it obtains this minimum

SAEs should be sparse, but not too sparse

notion image
Upper bound of the DL is:
where is the effective precision of each float and is the number of bits required to specify which features are active.
  • Sparsity is a key component of minimizing description length (DL)
  • There’s an inherent trade-off between decreasing L0 and decreasing the dictionary size in order to reduce description length
 
 

Minimum Description Length prefers hierarchical features

  • Optimizing for MDL can reduce undesirable feature splitting
  • Hierarchical features allow for more efficient coding schemes
notion image
 
 

How to use as metric

Although the comparison is slightly unfair because the SAE is lossy
  • DL per input token
    • GPT2 itself 5376 bits per token
    • SAE for GPT2 is 1405 bits per token
    • For one-hot encoding (dictionary has a row for each neural activation in the dataset; and ), 13, 993 bits per token
 
 
 
 
 

Recommendations