AI Agent RL

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 1 16:59
Editor
Edited
Edited
2026 Jan 1 17:0
Refs
Refs
 
 
 
 
notion image
  • A1: Explicit RL / RLHF / RLAIF using tool execution results as reward
  • T2: Indirect RL (teacher–student, distillation, preference learning) using agent output as reward·critic
  • A2: Heavy use of supervised / preference-based fine-tuning
  • T1: Focus on learning retrieval·tool itself (contrastive, supervised)
 

Backlinks

PRMPRM

Recommendations