Agent0

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Dec 17 12:26
Editor
Edited
Edited
2025 Dec 17 12:27
Refs
Refs
Two agents starting from the same base LLM are co-evolved:
  • Curriculum Agent: Uses RL to generate frontier tasks where the Executor has the highest uncertainty
  • Executor Agent: Learns to solve those tasks via RL
Tool integration as growth engine: When a code interpreter tool is added to the Executor, its problem-solving ability improves, which pressures the Curriculum to become more tool-aware and create harder problems, forming a virtuous cycle where difficulty and capability rise together.

Reward/Learning Design:

Curriculum rewards: (1) Executor's self-consistency-based uncertainty (maximum when p̂ is near 0.5) + (2) tool usage frequency, (3) repetition penalty for diversity
  • Executor filters tasks by p̂ to learn only from data that's "neither too easy nor too hard," uses majority vote pseudo-labels, and applies ADPO (ambiguity-aware) to adjust update strength and reduce label noise.
 
 
 
 
 
 
 

Recommendations