Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/Parallel Training/Model Parallelism/
Optimizer State Sharding
Search

Optimizer State Sharding

Created
Created
2024 Feb 26 16:53
Editor
Editor
Seonglae Cho
Creator
Creator
Seonglae Cho
Edited
Edited
2024 Feb 26 16:54
Refs
Refs
FSDP
notion image
 
 
 
 
 
Fully Sharded Data Parallel: faster AI training with fewer GPUs
Training AI models at a large scale isn’t easy. Aside from the need for large amounts of computing power and resources, there is also considerable engineering complexity behind training very large …
Fully Sharded Data Parallel: faster AI training with fewer GPUs
https://engineering.fb.com/2021/07/15/open-source/fsdp/
Fully Sharded Data Parallel: faster AI training with fewer GPUs
 
 

Recommendations

Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/Parallel Training/Model Parallelism/
Optimizer State Sharding
Copyright Seonglae Cho