Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Reinforcement Learning/Language Model RL/
RLAIF
Search

RLAIF

Creator
Creator
Seonglae Cho
Created
Created
2023 Sep 10 4:49
Editor
Editor
Seonglae Cho
Edited
Edited
2024 May 18 9:25
Refs
Refs
Pareto efficiency
 
 
 
 
 

Shapherd

Shepherd: A Critic for Language Model Generation
As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language...
Shepherd: A Critic for Language Model Generation
https://arxiv.org/abs/2308.04592
Shepherd: A Critic for Language Model Generation
arxiv.org
https://arxiv.org/pdf/2309.00267.pdf
 
 

Backlinks

Language Model RLCritic based reward model

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Reinforcement Learning/Language Model RL/
RLAIF
Copyright Seonglae Cho