Multi-GPU or multi-node Distributed Training, Federated learning
Data parallelism or model parallelism
- In data parallelism, the data is split into multiple parts
- in model parallelism, different parts of the model are processed by separate processors
These parallelism are states as 4D parallelism or 3D parallelism
Parallel Training Notion