Chain of Hindsight

Creator

Creator

Seonglae Cho

Created

Created

2023 Jun 29 12:25

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jan 15 15:25

Refs

Refs

CoH

모든 유형의 피드백을 문장으로 변환한 다음 모델을 미세 조정하는 데 사용

notion image

Chain of Hindsight Aligns Language Models with Feedback

Learning from human preferences is important for language models to be helpful and useful for humans, and to align with human and social values. Prior work have achieved remarkable successes by...

https://arxiv.org/abs/2302.02676

Recommendations

///////////