Language Model RL

Creator
Creator
Seonglae Cho
Created
Created
2023 Sep 9 17:8
Editor
Edited
Edited
2025 Apr 30 23:58
Language Model Rewards
 
 
Language Model Reward Benchmarks
 
 
 
Reinforcement Learning Transformers
 
 
Language Model RL Frameworks
 
 

Era of Experience

but, it has low
Sample efficiency
in larger samples and only finds better reasoning paths within its existing capacity which makes its total problem solving coverage smaller
SFT Memorizes, RL Generalizes (
AI Memory
,
Model Generalization
,
OOD
)
While the provocative title is not exactly correct, it provides insight even for Multimodality
 
 

Recommendations