Interpretable features tend to arise (at a given level of abstraction) if and only if the training distribution is diverse enough (at that level of abstraction).
Diversity Hypothesis
Creator
Creator

Created
Created
2025 Feb 4 11:5Editor
Editor

Edited
Edited
2025 Feb 27 20:57Refs