DeepSWE

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 10 20:23
Editor
Edited
Edited
2025 Jul 10 20:28
Refs
Refs
Simulates a real software engineering environment (code modifications, builds, tests, etc.) and solves problems using various tools (terminal, file editing, etc.) within an RL environment

Hybrid Verifier

Combines execution-free (LLM evaluates patches) + execution-based (actual test execution) approaches
 

GRPO++ (to build DeepSWE,
Together AI
)

Drawing ideas from DAPO, Dr.GRPO, LOOP/RLOO and others, the following improvements were added:
  • Clip High (DAPO): Increase surrogate loss upper bound to enhance exploration
  • No KL Loss (DAPO)
  • No Reward Std, Length Normalization (Dr.GRPO)
  • Leave One Out: Reduce variance by removing one sample when estimating advantage (Loop/
    RLOO
    )
  • Compact Filtering: Mask trajectories when reaching max context/steps/timeout
  • No Entropy Loss: Remove entropy loss to prevent instability
 
 
 

Recommendations