Multi-GPU | multi-node Distributed Training, Federated learning
Data parallelism or model parallelism
- In data parallelism, the data is split into multiple parts
- in model parallelism, different parts of the model are processed by separate processors
These parallelism are states as 4D parallelism or 3D parallelism
Parallel Training Notion
Distributed ML Tools

Large Scale
Model Parallelism
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/docs/transformers/v4.15.0/parallelism

Seonglae Cho