ORPO: Monolithic Preference Optimization without Reference Model This week's paper is ORPO: Monolithic Preference Optimization without Reference Model Further Reading: Provably Robust DPO: Aligning Language Models with Noisy Feedback Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks ORPO math derivation Usefull ORPO training details