480
v1v2 (latest)

ff-PO: Generalizing Preference Optimization with ff-divergence Minimization

International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Main:9 Pages
4 Figures
Bibliography:1 Pages
6 Tables
Appendix:5 Pages
Abstract

Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces ff-divergence Preference Optimization (ff-PO), a novel framework that generalizes and extends existing approaches. ff-PO minimizes ff-divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of ff-divergences. We provide theoretical analysis of ff-PO's properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate ff-PO's effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different ff-divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of language model alignment. Code is available atthis https URL.

View on arXiv
Comments on this paper