AI Agent RL

Creator

Creator

Seonglae Cho

Created

Created

2026 Jan 1 16:59

Editor

Editor

Seonglae Cho

Edited

Edited

2026 Jan 1 17:0

Refs

Refs

notion image

A1: Explicit RL / RLHF / RLAIF using tool execution results as reward

T2: Indirect RL (teacher–student, distillation, preference learning) using agent output as reward·critic

A2: Heavy use of supervised / preference-based fine-tuning

T1: Focus on learning retrieval·tool itself (contrastive, supervised)

https://arxiv.org/pdf/2512.16301v2

Backlinks

Recommendations

//////