Proposing a framework that evolves by creating and solving problems on its own without external data. A single base LLM is divided into two roles: Challenger (problem generator) and Solver (problem solver), with each being optimized through GRPO (reinforcement learning technique) and repeated in a co-evolution process.
www.arxiv.org
https://www.arxiv.org/pdf/2508.05004

Seonglae Cho