- Tokenizer Training (Rust BPE)
- Train BPE tokenizer on FineWeb-EDU data
- Configure vocab + special tokens
- Base Pretraining
- Train general language model on large-scale text (FineWeb-EDU)
- Next-token prediction objective
- Mid-training (Intermediate Adaptation Stage)
- Use mixed data from SmolTalk (conversation), MMLU, GSM8K
- Expand model's internal reasoning and world knowledge without masking and chat special token
- SFT (Supervised Fine-Tuning, Chat Format)
- Use user ↔ assistant conversation format data
- Masking user messages, backpropagate assistant only
- Optional RL (GRPO / REINFORCE)
- Update model based on rewards for GSM8K problems
- Generate multiple answer samples → apply policy gradient

Seonglae Cho