Optimizer State Sharding

Creator

Creator

Seonglae Cho

Created

Created

2024 Feb 26 16:53

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Feb 26 16:54

Refs

Refs

notion image

Fully Sharded Data Parallel: faster AI training with fewer GPUs

Training AI models at a large scale isn’t easy. Aside from the need for large amounts of computing power and resources, there is also considerable engineering complexity behind training very large …

https://engineering.fb.com/2021/07/15/open-source/fsdp/

Fully Sharded Data Parallel: faster AI training with fewer GPUs

Recommendations

//////////