Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data

25 February 2025

Siqi Guo

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Main:8 Pages

9 Figures

Bibliography:4 Pages

15 Tables

Appendix:6 Pages

Abstract

Supervised fine-tuning (SFT) followed by preference optimization (PO) denoted by SFT $\rightarrow$ PO has become the standard for improving pretrained large language models (LLMs), with PO demonstrating significant performance gains. However, PO methods rely on either human-labeled preference data or a strong reward model to generate preference data. Can we fine-tune LLMs without preference data or reward models while achieving competitive performance to SFT $\rightarrow$ PO? We address this question by introducing Discriminative Fine-Tuning (DFT), a novel approach that eliminates the need for preference data. Unlike SFT, which employs a generative approach and overlooks negative data, DFT adopts a discriminative paradigm that that increases the probability of positive answers while suppressing potentially negative ones, shifting from token prediction to data prediction. Our contributions include: (i) a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input; (ii) efficient algorithms to optimize this discriminative likelihood; and (iii) extensive experiments demonstrating DFT's effectiveness, achieving performance better than SFT and comparable to if not better than SFT $\rightarrow$ PO. The code can be found atthis https URL.

View on arXiv

Comments on this paper