New Desiderata for Direct Preference Optimization

New Desiderata for Direct Preference Optimization

12 July 2024

Tong He

Papers citing "New Desiderata for Direct Preference Optimization"

5 / 5 papers shown

Title
Direct Preference Optimization with an Offset Afra Amini Tim Vieira Ryan Cotterell 68 22 0 16 Feb 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 211 327 0 23 Aug 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 273 1,561 0 18 Sep 2019
Revisiting Bayesian Blind Deconvolution David Wipf Haichao Zhang 38 118 0 10 May 2013