RLAIF

Creator

Creator

Seonglae Cho

Created

Created

2023 Sep 10 4:49

Editor

Editor

Seonglae Cho

Edited

Edited

2024 May 18 9:25

Refs

Refs

Pareto efficiency

Shapherd

Shepherd: A Critic for Language Model Generation

As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language...

https://arxiv.org/abs/2308.04592

https://arxiv.org/pdf/2309.00267.pdf

Recommendations

///////