ORPO

Creator

Created

2024 Mar 16 11:32

Editor

Edited

2024 Mar 16 11:33

Refs

Monolithic Preference Optimization without Reference Model

train LLMs by combining SFT and Alignment into a new objective (loss function)

///////