GIST

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Feb 18 17:27
Editor
Edited
Edited
2026 Feb 18 17:29
Refs
Refs

Greedy Independent Set Thresholding

An algorithm for selecting representative training samples from large-scale datasets. Simultaneously maximizes diversity + utility. An algorithm for selecting a small but high-quality training subset from large-scale data. Sets a distance threshold to avoid selecting data points that are too close (similar) to each other. Under this constraint, greedily selects only data with high utility. This problem is originally NP-hard (very difficult to find exact solution). GIST guarantees performance of at least 1/2 of the optimal solution (provable guarantee).
 
 
 
 
research.google
 
 
 

Recommendations