Odds Ratio Preference OptimizationMonolithic Preference Optimization without Reference Modeltrain LLMs by combining SFT and Alignment into a new objective (loss function) arxiv.orghttps://arxiv.org/pdf/2403.07691.pdf