Language Model RL

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 9 17:8
Editor
Edited
Edited
2025 Oct 28 0:25
All
RLHF
-like language model RL methods have this to prevent
AI Reward Hacking
.
Language Model RL Methods
 
 
Language Model Reward Benchmarks
 
 
 
Reinforcement Learning Transformers
 
 
Language Model RL Frameworks
 
 

Era of Experience

However, it has low
Sample efficiency
in larger samples and only finds better reasoning paths within its existing capacity which makes its total problem solving coverage smaller
SFT Memorizes, RL Generalizes (
AI Memory
,
Model Generalization
,
OOD
)
While the provocative title is not exactly correct, it provides insight even for Multimodality

Agent RL vulnerability

Search LLMs trained with agentic RL may appear safe, but can be easily jailbroken by manipulating the timing of the search step. The RL objective itself fails to suppress harmful queries, making "search first" behavior a critical vulnerability.
 
 
 

Recommendations