nanoChat

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Oct 14 21:31
Editor
Edited
Edited
2025 Oct 14 21:45
  • Tokenizer Training (Rust BPE)
    • Train BPE tokenizer on FineWeb-EDU data
    • Configure vocab + special tokens
  • Base Pretraining
    • Train general language model on large-scale text (FineWeb-EDU)
    • Next-token prediction objective
  • Mid-training (Intermediate Adaptation Stage)
    • Use mixed data from SmolTalk (conversation), MMLU, GSM8K
    • Expand model's internal reasoning and world knowledge without masking and chat special token
  • SFT (Supervised Fine-Tuning, Chat Format)
    • Use user ↔ assistant conversation format data
    • Masking user messages, backpropagate assistant only
  • Optional RL (GRPO / REINFORCE)
    • Update model based on rewards for GSM8K problems
    • Generate multiple answer samples → apply policy gradient
 
 
 
 
 
 

Recommendations