Proposing a framework that evolves by creating and solving problems on its own without external data. A single base LLM is divided into two roles: Challenger (problem generator) and Solver (problem solver), with each being optimized through GRPO (reinforcement learning technique) and repeated in a co-evolution process.
R-Zero
Creator
Creator

Created
Created
2025 Aug 25 21:56Editor
Editor

Edited
Edited
2025 Aug 25 21:57Refs
Refs