Self-play SWE-RL

Without human data, bugs are generated (Injector) and solved (Solver) autonomously in real codebases, co-training via RL. Input requires only a Docker-sandboxed repo. Bugs are formally (spec) defined as code patch + test weakening patch. Limitations: test-based verification limits, stability issues at scale, single model/role separation unexplored.

SWE-bench Verified: +10.4%p

SWE-Bench Pro: +7.8%p

arxiv.org

https://arxiv.org/pdf/2512.18552

Self-play SWE-RL

Recommendations