RLER

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 1 11:45
Editor
Edited
Edited
2026 Jan 3 22:7
Refs
 
 

DR Tulu

Fully open training recipe for long-form, tool-based deep research agents. End-to-end learning of planning, retrieval, synthesis, and citation via SFT + RLER (Reinforcement Learning with Evolving Rubrics). Automatic selection of MCP-based multi-tools (web search, browsing, paper search), with grounded citations in responses. RLER provides rewards for each reasoning step using LLM-as-a-judge with evolving rubrics.
 
 
 
 

Recommendations