A density estimation perspective on learning from pairwise human
preferences

A density estimation perspective on learning from pairwise human preferences

23 November 2023

Vincent Dumoulin

Daniel D. Johnson

Pablo Samuel Castro

Hugo Larochelle

Papers citing "A density estimation perspective on learning from pairwise human preferences"

11 / 11 papers shown

Title
Capturing Individual Human Preferences with Reward Features André Barreto Vincent Dumoulin Yiran Mao Nicolas Perez-Nieves Bobak Shahriari Yann Dauphin Doina Precup Hugo Larochelle ALM 59 1 0 21 Mar 2025
PIPA: Preference Alignment as Prior-Informed Statistical Estimation Junbo Li Zhangyang Wang Qiang Liu OffRL 100 0 0 09 Feb 2025
On Extending Direct Preference Optimization to Accommodate Ties Jinghong Chen Guangyu Yang Weizhe Lin Jingbiao Mei Bill Byrne 21 3 0 25 Sep 2024
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning S. Poddar Yanming Wan Hamish Ivison Abhishek Gupta Natasha Jaques 27 33 0 19 Aug 2024
Improving Context-Aware Preference Modeling for Language Models Silviu Pitis Ziang Xiao Nicolas Le Roux Alessandro Sordoni 24 8 0 20 Jul 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning Jifan Zhang Lalit P. Jain Yang Guo Jiayi Chen Kuan Lok Zhou ... Scott Sievert Timothy Rogers Kevin Jamieson Robert Mankoff Robert Nowak 29 5 0 15 Jun 2024
Optimizing Language Models for Human Preferences is a Causal Inference Problem Victoria Lin Eli Ben-Michael Louis-Philippe Morency 36 3 0 22 Feb 2024
Aligning Large Language Models with Human Preferences through Representation Engineering Wenhao Liu Xiaohua Wang Muling Wu Tianlong Li Changze Lv Zixuan Ling Jianhao Zhu Cenyuan Zhang Xiaoqing Zheng Xuanjing Huang 11 29 0 26 Dec 2023
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF Anand Siththaranjan Cassidy Laidlaw Dylan Hadfield-Menell 13 54 0 13 Dec 2023
RLHF and IIA: Perverse Incentives Wanqiao Xu Shi Dong Xiuyuan Lu Grace Lam Zheng Wen Benjamin Van Roy 11 2 0 02 Dec 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,730 0 04 Mar 2022