- When calculating MCS between features extracted from specific dictionaries, the resulting distribution has two peaks:
- One is the region where MCS values are close to 1 (cases composed of similar features).
- The other is the region where MCS values are around 0.3 (features that are dissimilar and appear random).
- This phenomenon is called bimodality, and each peak likely indicates the following:
- Features with high MCS values: These are likely meaningful "real" features that are repeatedly found across multiple dictionaries.
- Features with low MCS values: These are non-meaningful features that appear like random noise or dead neurons.
*MSD (maximum cosine similarity)
The ultralow density cluster appears to be an artifact of the autoencoder training process and not a real property of the underlying transformer

Seonglae Cho![[Research Update] Sparse Autoencoder features are bimodal](https://www.notion.so/image/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2Ff_auto%2Cq_auto%3Abest%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Faizi.substack.com%252Fapi%252Fv1%252Fpost_preview%252F129725701%252Ftwitter.jpg%253Fversion%253D4?table=block&id=189c3c96-247d-8065-ba89-db95e7b7df92&cache=v2)