Branch Train Merge

Cluster data as preprocessing step

Train LMs

parameter averaging to merge them, but this causes the knowledge from each expert to mix or cancel out, making it difficult to preserve domain-specific capabilities.

Branch-Train-Merge: Embarrassingly Parallel Training of Expert...

We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is possible to independently train...

https://arxiv.org/abs/2208.03306

Branch Train Merge

Backlinks

Recommendations