32
3

ff-PO: Generalizing Preference Optimization with ff-divergence Minimization

Abstract

Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces ff-divergence Preference Optimization (ff-PO), a novel framework that generalizes and extends existing approaches. ff-PO minimizes ff-divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of ff-divergences. We provide theoretical analysis of ff-PO's properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate ff-PO's effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different ff-divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of language model alignment. Code is available atthis https URL.

View on arXiv
@article{han2025_2410.21662,
  title={ $f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization },
  author={ Jiaqi Han and Mingjian Jiang and Yuxuan Song and Stefano Ermon and Minkai Xu },
  journal={arXiv preprint arXiv:2410.21662},
  year={ 2025 }
}
Comments on this paper