DDP

Created
Created
2024 Feb 26 7:54
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Nov 10 12:51
Refs
Refs

Distributed Data Parallel

parameter, gradient, optimizer states for each GPU
  • sampler를 통해 각 GPU에 서로 다른 데이터가 전송
  • 각 데이터를 이용해서 모델 파라미터의 gradients를 계산
  • All Reduce 연산을 통해 gradients 평균을 구한 뒤, 모든 GPU에 전달
  • 이후 optimizer의 step을 통해 각 GPU에서 모델 파라미터가 업데이트
모두 똑같은 모델 정보 보장
 
 

Types

  • Multi node
  • Multi GPU
 
 
 
From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease
Distributed Data Parallel in PyTorch Tutorial Series
Suraj Subramanian breaks down why Distributed Training is an important part of your ML arsenal. The series starts with a simple non-distributed training job,...
Distributed Data Parallel in PyTorch Tutorial Series
 

Recommendations