Branch-Train-Merge: Embarrassingly Parallel Training of Expert...
We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is possible to independently train...
https://arxiv.org/abs/2208.03306