Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.15006
Cited By
Fine-tuning language models to find agreement among humans with diverse preferences
28 November 2022
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
Jan Balaguer
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fine-tuning language models to find agreement among humans with diverse preferences"
23 / 123 papers shown
Title
"Tidy Up the Table": Grounding Common-sense Objective for Tabletop Object Rearrangement
Yiqing Xu
David Hsu
LM&Ro
LMTD
21
0
0
21 Jul 2023
Evaluating AI systems under uncertain ground truth: a case study in dermatology
David Stutz
A. Cemgil
Abhijit Guha Roy
Tatiana Matejovicova
Melih Barsbey
...
Yossi Matias
Pushmeet Kohli
Yun-hui Liu
Arnaud Doucet
Alan Karthikesalingam
25
4
0
05 Jul 2023
Proportional Aggregation of Preferences for Sequential Decision Making
Nikhil Chandak
Shashwat Goel
Dominik Peters
22
9
0
26 Jun 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Christopher T. Small
Ivan Vendrov
Esin Durmus
Hadjar Homaei
Elizabeth Barry
Julien Cornebise
Ted Suzman
Deep Ganguli
Colin Megill
24
26
0
20 Jun 2023
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
19
40
0
20 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
35
135
0
07 Jun 2023
Prompt Evolution for Generative AI: A Classifier-Guided Approach
Melvin Wong
Yew-Soon Ong
Abhishek Gupta
K. Bali
Caishun Chen
16
14
0
24 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
40
537
0
22 May 2023
Self-Agreement: A Framework for Fine-tuning Language Models to Find Agreement among Diverse Opinions
Shiyao Ding
Takayuki Ito
SyDa
11
6
0
19 May 2023
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models
Wanqiao Xu
Shi Dong
Dilip Arumugam
Benjamin Van Roy
25
8
0
19 May 2023
Working Memory Capacity of ChatGPT: An Empirical Study
Dongyu Gong
Xingchen Wan
Dingmin Wang
LLMAG
KELM
AI4MH
10
12
0
30 Apr 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
216
299
0
26 Apr 2023
Evaluating Verifiability in Generative Search Engines
Nelson F. Liu
Tianyi Zhang
Percy Liang
HILM
14
231
0
19 Apr 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
33
99
0
09 Mar 2023
'Generative CI' through Collective Response Systems
Aviv Ovadya
11
10
0
01 Feb 2023
Debiasing Vision-Language Models via Biased Prompts
Ching-Yao Chuang
Varun Jampani
Yuanzhen Li
Antonio Torralba
Stefanie Jegelka
VLM
28
96
0
31 Jan 2023
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
33
1
0
24 Dec 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
In conversation with Artificial Intelligence: aligning language models with human values
Atoosa Kasirzadeh
Iason Gabriel
10
98
0
01 Sep 2022
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
Gati Aher
RosaI. Arriaga
Adam Tauman Kalai
35
343
0
18 Aug 2022
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
235
257
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,909
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,587
0
18 Sep 2019
Previous
1
2
3