Distributionally robust optimizationDRO UsagesGroup DRODRO-LM Paper page - DoReMi: Optimizing Data Mixtures Speeds Up Language Model PretrainingJoin the discussion on this paper pagehttps://huggingface.co/papers/2305.10429