KTO

Created
Created
2025 Jul 2 14:28
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Jul 2 14:47

Kahneman-Tversky Optimization

https://arxiv.org/pdf/2402.01306
Looking at the loss function, it's essentially DPO with added KL divergence and sigmoid functions, which are introduced to provide reference points and create concave/convex regions.
Unlike preference pairs, it achieves equal or better performance than traditional DPO methods using just binary "desirable/undesirable" signals, while being robust to data imbalance and hyperparameter sensitivity. Additionally, it showed no performance degradation compared to DPO even when applied directly after pretraining without the SFT stage.

HALOs(Human-Aware Losses)

A new concept proposed in the paper, referring to a family of loss functions that explicitly incorporate "human value functions (v)." The paper showed that existing methods like DPO and PPO-Clip also fall into this HALO category.
 
 
 
 
 
 
 

Recommendations