Preference Optimization

Creator

Creator

Seonglae Cho

Created

Created

2023 Sep 24 4:20

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Aug 1 16:46

Refs

Refs

Language Model RL

Preference Optimization methods

Preference Elicitation from RL

https://arxiv.org/pdf/1706.03741

Learning from human preferences

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety team, we’ve developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better.

Learning from human preferences

https://openai.com/index/learning-from-human-preferences/

Learning from human preferences

Utility Engineering

Value systems about AI preference with high degrees of structural coherence which emerges in scale

https://arxiv.org/pdf/2502.08640

Upper & Lower Bounds

https://arxiv.org/pdf/2502.05934

Backlinks

Reinforcement Learning BiDPO

Recommendations

//////