torch.distributed members
- NCCL → GPU communication (default, fastest)
- Gloo → CPU and fallback communication (GPU possible but slower)
- MPI → Communication via MPI process launcher (non-standard, rarely used)
Distributed communication package - torch.distributed — PyTorch 2.2 documentation
Please refer to PyTorch Distributed Overview
for a brief introduction to all features related to distributed training.
https://pytorch.org/docs/stable/distributed.html
Distributed communication package - torch.distributed — PyTorch 2.0 documentation
Community
https://pytorch.org/docs/stable/distributed.html

Seonglae Cho