CoH
모든 유형의 피드백을 문장으로 변환한 다음 모델을 미세 조정하는 데 사용

Chain of Hindsight Aligns Language Models with Feedback
Learning from human preferences is important for language models to be helpful and useful for humans, and to align with human and social values. Prior work have achieved remarkable successes by...
https://arxiv.org/abs/2302.02676


Seonglae Cho