Getting Started with Fully Sharded Data Parallel(FSDP) — PyTorch Tutorials 2.2.1+cu121 documentation
Training AI models at a large scale is a challenging task that requires a lot of compute power and resources.
It also comes with considerable engineering complexity to handle the training of these very large models.
PyTorch FSDP, released in PyTorch 1.11 makes this easier.
https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html