Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.14115
Cited By
A density estimation perspective on learning from pairwise human preferences
23 November 2023
Vincent Dumoulin
Daniel D. Johnson
Pablo Samuel Castro
Hugo Larochelle
Yann Dauphin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A density estimation perspective on learning from pairwise human preferences"
11 / 11 papers shown
Title
Capturing Individual Human Preferences with Reward Features
André Barreto
Vincent Dumoulin
Yiran Mao
Nicolas Perez-Nieves
Bobak Shahriari
Yann Dauphin
Doina Precup
Hugo Larochelle
ALM
59
1
0
21 Mar 2025
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
Junbo Li
Zhangyang Wang
Qiang Liu
OffRL
100
0
0
09 Feb 2025
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen
Guangyu Yang
Weizhe Lin
Jingbiao Mei
Bill Byrne
21
3
0
25 Sep 2024
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
S. Poddar
Yanming Wan
Hamish Ivison
Abhishek Gupta
Natasha Jaques
27
33
0
19 Aug 2024
Improving Context-Aware Preference Modeling for Language Models
Silviu Pitis
Ziang Xiao
Nicolas Le Roux
Alessandro Sordoni
24
8
0
20 Jul 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
29
5
0
15 Jun 2024
Optimizing Language Models for Human Preferences is a Causal Inference Problem
Victoria Lin
Eli Ben-Michael
Louis-Philippe Morency
36
3
0
22 Feb 2024
Aligning Large Language Models with Human Preferences through Representation Engineering
Wenhao Liu
Xiaohua Wang
Muling Wu
Tianlong Li
Changze Lv
Zixuan Ling
Jianhao Zhu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
11
29
0
26 Dec 2023
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
13
54
0
13 Dec 2023
RLHF and IIA: Perverse Incentives
Wanqiao Xu
Shi Dong
Xiuyuan Lu
Grace Lam
Zheng Wen
Benjamin Van Roy
11
2
0
02 Dec 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
1