DualPipe

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 27 13:29
Editor
Edited
Edited
2025 Mar 2 12:50

DualPipe overlaps the computation and communication within a pair of individual forward and backward chunks.

Each chunk into four components

notion image
  • attention
  • all-to-all dispatch
  • MLP
  • all-to-all combine
DualPipe algorithm for efficient pipeline parallelism, which has fewer pipeline bubbles and hides most of the communication during training through computation-communication overlap. This overlap ensures that, as the model further scales up, as long as we maintain a constant computation-to-communication ratio, we can still employ fine-grained experts across nodes while achieving a near-zero all-to-all communication overhead.
notion image

Properties

  • DualPipe not only accelerates model training by effectively overlapping forward and backward computation-communication phases, but also reduces the pipeline bubbles
  • Although DualPipe requires keeping two copies of the model parameters, this does not significantly increase the memory consumption since we use a large EP size during training.
 
 
 
 
 

 

Recommendations