RLER

DR Tulu

Fully open training recipe for long-form, tool-based deep research agents. End-to-end learning of planning, retrieval, synthesis, and citation via SFT + RLER (Reinforcement Learning with Evolving Rubrics). Automatic selection of MCP-based multi-tools (web search, browsing, paper search), with grounded citations in responses. RLER provides rewards for each reasoning step using LLM-as-a-judge with evolving rubrics.

DR Tulu: An open, end-to-end training recipe for long-form deep research | Ai2

We introduce Deep Research Tulu (DR Tulu), an open post-training recipe and framework for long-form deep research agents.

https://allenai.org/blog/dr-tulu

DR Tulu: An open, end-to-end training recipe for long-form deep research | Ai2

arxiv.org

https://arxiv.org/pdf/2511.19399

RLER

DR Tulu

Recommendations