Chain of Hindsight Aligns Language Models with Feedback
Learning from human preferences is important for language models to be helpful and useful for humans, and to align with human and social values. Prior work have achieved remarkable successes by...
https://arxiv.org/abs/2302.02676