R-Zero

Creator

Seonglae Cho

Created

2025 Aug 25 21:56

Editor

Seonglae Cho

Edited

2025 Aug 25 21:57

Refs

Proposing a framework that evolves by creating and solving problems on its own without external data. A single base LLM is divided into two roles: Challenger (problem generator) and Solver (problem solver), with each being optimized through GRPO (reinforcement learning technique) and repeated in a co-evolution process.

www.arxiv.org

https://www.arxiv.org/pdf/2508.05004

Recommendations

//////