421

Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data

Main:8 Pages
9 Figures
Bibliography:4 Pages
15 Tables
Appendix:6 Pages
Abstract

Supervised fine-tuning (SFT) followed by preference optimization (PO) denoted by SFT\rightarrowPO has become the standard for improving pretrained large language models (LLMs), with PO demonstrating significant performance gains. However, PO methods rely on either human-labeled preference data or a strong reward model to generate preference data. Can we fine-tune LLMs without preference data or reward models while achieving competitive performance to SFT\rightarrowPO? We address this question by introducing Discriminative Fine-Tuning (DFT), a novel approach that eliminates the need for preference data. Unlike SFT, which employs a generative approach and overlooks negative data, DFT adopts a discriminative paradigm that that increases the probability of positive answers while suppressing potentially negative ones, shifting from token prediction to data prediction. Our contributions include: (i) a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input; (ii) efficient algorithms to optimize this discriminative likelihood; and (iii) extensive experiments demonstrating DFT's effectiveness, achieving performance better than SFT and comparable to if not better than SFT\rightarrowPO. The code can be found atthis https URL.

View on arXiv
Comments on this paper