Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/Model Training Tool/Megatron/
Megatrom LM
Search

Megatrom LM

Creator
Creator
Seonglae Cho
Created
Created
2023 Apr 25 14:7
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Mar 8 15:58
Refs
Refs
Megatron-LM
NVIDIA • Updated 2023 Apr 25 13:54
Megatron-LM은 tensor-slicing model parallelism를 사용한다. Megatron-LM의 tensor-slicing model parallelism에 DeepSpeed ZeRO-2의 data parallelism을 추가하면 DeepSpeed가 Megatron-LM보다 10x 더 빠르다.
 
 
 
 
 
Large Model 학습의 game changer, MS의 DeepSpeed ZeRO-1,2,3 그리고 ZeRO-Infinity
DeepSpeed ZeRO는 Large Model 학습에 본격적으로 Heterogeneous Computing을 활용하여 Large Model 학습에 필요한 비용을 절감할 수 있다.
Large Model 학습의 game changer, MS의 DeepSpeed ZeRO-1,2,3 그리고 ZeRO-Infinity
https://moon-walker.medium.com/large-model-학습의-game-changer-ms의-deepspeed-zero-1-2-3-그리고-zero-infinity-74c9640190de
Large Model 학습의 game changer, MS의 DeepSpeed ZeRO-1,2,3 그리고 ZeRO-Infinity
 
 

Recommendations

Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/Model Training Tool/Megatron/
Megatrom LM
Copyright Seonglae Cho