Loading views...

20240312 Yokhal Distributed Training

Date
Date
2024 Mar 16 0:0 → 2024 Mar 18 0:0
Created by
Created by
Seonglae ChoSeonglae Cho
Created time
Created time
2024 Mar 12 12:36
Last edited by
Last edited by
Seonglae ChoSeonglae Cho
Last edited time
Last edited time
2024 Mar 26 5:21
Refs
Refs

Todo list

nanogpt transformer 추가
pip에 배포해보기 모델 불러오기 가능하게? autoModel
stream 안되는 문제

다음주

DDP with nanogpt
FSDP with accelerate
mistral 훈련
small nano special token 추가해서 chat data 학습?
DPO training
korean llm leaderboard 등록하거나 여러 benchmark 돌리기
nanogpt 에 groupgpt 등 전부 공부한 뒤 테크닉 추가해서 hanGPT model_type 최종목표
tinybencthmark evaluate에 추가해서 evaluate하기
 
notion image
RTX3090 4개를 multi-node 분산학습을 자유자재로 하는 남자 어떤데
 
 
 
 
 
 
 
 

Best practice

Fine-tuning Llama 2 70B using PyTorch FSDP
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Fine-tuning Llama 2 70B using PyTorch FSDP
Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel
 
 
 
 
 
 
 
 
 
 
fsdp.py
 
 
 
 
 
 

fsdp final

 
 
 

Recommendations