ORPO

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 16 11:32
Editor
Edited
Edited
2024 Mar 16 11:33
Refs
Refs

Odds Ratio Preference Optimization

Monolithic Preference Optimization without Reference Model
train LLMs by combining SFT and Alignment into a new objective (loss function)
 
 
 
 
 
 

Recommendations