Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/
Parallel Training
Search

Parallel Training

Creator
Creator
Seonglae Cho
Created
Created
2022 Mar 15 11:35
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Jun 9 1:25
Refs
Refs
AI Compiler Optimization
Distributed Optimizer

Multi-GPU | multi-node Distributed Training, Federated learning

Data parallelism or model parallelism

  • In data parallelism, the data is split into multiple parts
  • in model parallelism, different parts of the model are processed by separate processors

These parallelism are states as 4D parallelism or 3D parallelism

Parallel Training Notion
Data Parallelism
Tensor Parallelism
Pipeline parallelism
Sequence Parallelism
Model Parallelism
AI Model Memory
Expert Parallelism
Context Parallelism
 
 
 
 
https://xiandong79.github.io
 
 

Large Scale

huggingface.co
https://huggingface.co/spaces/nanotron/ultrascale-playbook
Model Parallelism
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Model Parallelism
https://huggingface.co/docs/transformers/v4.15.0/parallelism
Model Parallelism
arxiv.org
https://arxiv.org/pdf/2205.05198.pdf
soumith.ch
https://soumith.ch/blog/2024-10-02-training-10k-scale.md.html
 
 

Table of Contents
Multi-GPU | multi-node Distributed Training, Federated learningData parallelism or model parallelismThese parallelism are states as 4D parallelism or 3D parallelismLarge Scale

Backlinks

AI Optimizationpython contextvarsModel OptimizerPytorchAI FrameworkMoE RoutingModel Training Tool

Recommendations

Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/
Parallel Training
Copyright Seonglae Cho