DDP

Creator

Creator

Created

Created

2024 Feb 26 7:54

Editor

Editor

Edited

Edited

2024 Nov 10 12:51

Refs

Refs

Distributed Data Parallel

parameter, gradient, optimizer states for each GPU

sampler를 통해 각 GPU에 서로 다른 데이터가 전송

각 데이터를 이용해서 모델 파라미터의 gradients를 계산

All Reduce 연산을 통해 gradients 평균을 구한 뒤, 모든 GPU에 전달

이후 optimizer의 step을 통해 각 GPU에서 모델 파라미터가 업데이트

모두 똑같은 모델 정보 보장

Types

Multi node

Multi GPU

torchrun,
torch.distributed

Huggingface accelerate

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

https://huggingface.co/blog/pytorch-ddp-accelerate-transformers

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

Distributed Data Parallel in PyTorch Tutorial Series

Suraj Subramanian breaks down why Distributed Training is an important part of your ML arsenal. The series starts with a simple non-distributed training job,...

Distributed Data Parallel in PyTorch Tutorial Series

https://www.youtube.com/playlist?list=PL_lsbAsL_o2CSuhUhJIiW0IkdT5C2wGWj

Distributed Data Parallel in PyTorch Tutorial Series

Recommendations

//////////