wrap policy란, FSDP에서 모델을 어떻게 나눌 것인가에 대한 policy
PyTorch에서는 기본적으로 transformer 모델에 대한 policy를 지원
Getting Started with Fully Sharded Data Parallel(FSDP) — PyTorch Tutorials 2.2.1+cu121 documentation
Training AI models at a large scale is a challenging task that requires a lot of compute power and resources.
It also comes with considerable engineering complexity to handle the training of these very large models.
PyTorch FSDP, released in PyTorch 1.11 makes this easier.
https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html
fsdp2
Supercharging Training using float8 and FSDP2
IBM: Tuan Hoang Trong, Alexei Karve, Yan Koyfman, Linsong Chu, Divya Kumari, Shweta Salaria, Robert Walkup, Praneet Adusumilli, Nirmit Desai, Raghu Ganti, Seetharami Seelam Meta: Less Wright, Wei Feng, Vasiliy Kuznetsov, Driss Guesseous
https://pytorch.org/blog/training-using-float8-fsdp2/


Seonglae Cho