arxiv.orghttps://arxiv.org/pdf/2401.10020meta rewardingMeta-Rewarding Language Models: Self-Improving Alignment with...Large Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms (Yuan...https://openreview.net/forum?id=lbj0i29Z92