Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Preference Optimization/
SimPO
Search

SimPO

Creator
Creator
Seonglae Cho
Created
Created
2024 May 28 9:45
Editor
Editor
Seonglae Cho
Edited
Edited
2024 May 29 1:5
Refs
Refs
 
 
 
 
 
 
 
 
 
SimPO: Simple Preference Optimization with a Reference-Free Reward
Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to...
SimPO: Simple Preference Optimization with a Reference-Free Reward
https://arxiv.org/abs/2405.14734
SimPO: Simple Preference Optimization with a Reference-Free Reward
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Preference Optimization/
SimPO
Copyright Seonglae Cho