Gradient space reflects "differences from the model's perspective" (cognitive adjustment, Surprise Score). However, due to high dimensionality, Johnson–Lindenstrauss projection ensures computational efficiency. The reason for choosing Vendi Score is the effective number of diverse directions in the gradient distribution. Using a small proxy model's loss gradient with random projection (Johnson–Lindenstrauss) to create representations, Vendi Score of the covariance is calculated to quantify data diversity. It strongly correlates with OOD performance at Spearman ρ≈0.9. Prismatic Synthesis: (1) gradient space K-means clustering → (2) few-shot new sample generation → (3) adopt only sparse cluster samples iteratively to simultaneously increase data scale and diversity.
arxiv.org
https://arxiv.org/pdf/2505.20161

Seonglae Cho