NDM

Neighbor Distance Minimization

Unlike

Neuron SAE, rather than finding a single monosemantic feature, it is designed to capture subspaces where sets of mutually exclusive features (=values of variables) are clustered. In other words, it assumes that there are subspaces consisting of similar features in the representation space, and learns by dividing them into predetermined dimension partitions c, minimizing the sum of nearest neighbor distances within each subspace.

Since it's not looking for monosemantic features, it expects fewer dimensions when projecting, and as the proximity distance decreases, the entropy decreases. It learns dimension partitions to reduce this. Clustering is achieved through kNN distance minimization.

arxiv.org

https://arxiv.org/pdf/2508.01916

NDM

Neighbor Distance Minimization

Recommendations