Self-play SWE-RL

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 1 16:26
Editor
Edited
Edited
2026 Jan 1 16:27
Refs
Refs
Without human data, bugs are generated (Injector) and solved (Solver) autonomously in real codebases, co-training via RL. Input requires only a Docker-sandboxed repo. Bugs are formally (spec) defined as code patch + test weakening patch. Limitations: test-based verification limits, stability issues at scale, single model/role separation unexplored.
  • SWE-bench Verified: +10.4%p
  • SWE-Bench Pro: +7.8%p
 
 
 
 
 
 

Recommendations