Iteration Pipeline utilizing CoT + Answer Pruning + Iterative SFT
Performance improvements are shown below:
GSM8K
- Gemma2-2B: 41.9 → 57.6 (+15.7%p)
- Gemma2-9B: 66.4 → 82.4 (+16.0%p)
- LLaMA-70B: 78.6 → 91.5 (+12.9%p)
arxiv.org
https://arxiv.org/pdf/2504.18116

Seonglae Cho