IB, NVIDIA Quantum-2 InfiniBan
Stuck Process and SIGTERM Signal Interruption During Training with accelerate launch
Updated 2024 Feb 3 16:48
Environment Variables — NCCL 2.20.3 documentation
NCCL has an extensive set of environment variables to tune for specific usage.
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html
Switch
NVIDIA InfiniBand Switches
NVIDIA InfiniBand switches deliver high performance and port density at speeds of 40/56/100/200Gb/s for HPC, AI, Web 2.0, big data, clouds, and enterprise data centers.
https://www.nvidia.com/en-gb/networking/infiniband-switching/


Seonglae Cho