Optimization stage for distributed training
Deepspeed ZeROs

ZeRo
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters - Microsoft Research
The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to […]
https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/

Large Model 학습의 game changer, MS의 DeepSpeed ZeRO-1,2,3 그리고 ZeRO-Infinity
DeepSpeed ZeRO는 Large Model 학습에 본격적으로 Heterogeneous Computing을 활용하여 Large Model 학습에 필요한 비용을 절감할 수 있다.
https://moon-walker.medium.com/large-model-학습의-game-changer-ms의-deepspeed-zero-1-2-3-그리고-zero-infinity-74c9640190de

효율적인 분산 학습을 위한 DeepSpeed ZeRO
분산 학습 및 추론을 쉽고 효율적, 또 효과적으로 만드는 딥러닝 최적화 라이브러리 DeepSpeed ZeRO
https://velog.io/@seoyeon96/리서치-효율적인-분산-학습을-위한-DeepSpeed-ZeRO
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Large deep learning models offer significant accuracy gains, but training billions to trillions of parameters is challenging. Existing solutions such as data and model parallelisms exhibit...
https://arxiv.org/abs/1910.02054


Seonglae Cho