Distributed Low-Communication
Streaming DiLoCo by Deepmind
- Synchronize only subsets of parameters in sequence, rather than all at once, which greatly reduces peak bandwidth
- Allow workers to continue training while synchronizing, which decreases wall clock time
- Quantize the data exchanged by workers, which further reduces bandwidth across workers
DiLoCo: Distributed Low-Communication Training of Language Models
Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected...
https://arxiv.org/abs/2311.08105

OpenDiLoCo: An Open-Source Framework for Globally Distributed...
OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the...
https://arxiv.org/abs/2407.07852

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Introducing OpenDiLoCo, an open-source implementation and scaling of DeepMind’s Distributed Low-Communication (DiLoCo) method, enabling globally distributed AI model training.
https://www.primeintellect.ai/blog/opendiloco


Seonglae Cho