Interpretable features tend to arise (at a given level of abstraction) if and only if the training distribution is diverse enough (at that level of abstraction).
Diversity Hypothesis
Creator
Creator
Seonglae ChoCreated
Created
2025 Feb 4 11:5Editor
Editor
Seonglae ChoEdited
Edited
2025 Feb 27 20:57Refs