- When calculating MCS between features extracted from specific dictionaries, the resulting distribution has two peaks:
- One is the region where MCS values are close to 1 (cases composed of similar features).
- The other is the region where MCS values are around 0.3 (features that are dissimilar and appear random).
- This phenomenon is called bimodality, and each peak likely indicates the following:
- Features with high MCS values: These are likely meaningful "real" features that are repeatedly found across multiple dictionaries.
- Features with low MCS values: These are non-meaningful features that appear like random noise or dead neurons.
*MSD (maximum cosine similarity)
The ultralow density cluster appears to be an artifact of the autoencoder training process and not a real property of the underlying transformer