An information-theoretic and reference-free entropy-based metric for measuring dataset diversity. It measures diversity using the entropy of the eigenvalue distribution of the sample similarity kernel matrix K. Specifically, Vendi Score = exp(Shannon entropy of eigenvalues of K/n). Various similarity functions, such as those in embedding space, can be used to adjust diversity. The limitation is high computational complexity.
arxiv.org
https://arxiv.org/pdf/2210.02410

Seonglae Cho