Minimal Description Length
An explanation of some phenomena is a statement for which knowing gives some information about . An explanation is typically a natural language statement
The Description Length (DL) of some explanation e is given as , where is the metric denoting the number of bits needed to send the explanation through a communication channel.
For SAE, . The first term is and the second term is in the dataset size so the first term dominates in the large regime
We say an SAE is if it obtains this minimum
SAEs should be sparse, but not too sparse
Upper bound of the DL is:
where is the effective precision of each float and is the number of bits required to specify which features are active.
- Sparsity is a key component of minimizing description length (DL)
- There’s an inherent trade-off between decreasing L0 and decreasing the dictionary size in order to reduce description length
Minimum Description Length prefers hierarchical features
- Optimizing for MDL can reduce undesirable feature splitting
- Hierarchical features allow for more efficient coding schemes
How to use as metric
Although the comparison is slightly unfair because the SAE is lossy
- DL per input token
- GPT2 itself 5376 bits per token
- SAE for GPT2 is 1405 bits per token
- For one-hot encoding (dictionary has a row for each neural activation in the dataset; and ), 13, 993 bits per token