KTO

Kahneman-Tversky Optimization

Looking at the loss function, it's essentially DPO with added KL divergence and sigmoid functions, which are introduced to provide reference points and create concave/convex regions.

Unlike preference pairs, it achieves equal or better performance than traditional DPO methods using just binary "desirable/undesirable" signals, while being robust to data imbalance and hyperparameter sensitivity. Additionally, it showed no performance degradation compared to DPO even when applied directly after pretraining without the SFT stage.

HALOs(Human-Aware Losses)

A new concept proposed in the paper, referring to a family of loss functions that explicitly incorporate "human value functions (v)." The paper showed that existing methods like DPO and PPO-Clip also fall into this HALO category.

arxiv.org

https://arxiv.org/pdf/2402.01306

KTO

Kahneman-Tversky Optimization

HALOs(Human-Aware Losses)

Backlinks

Recommendations