Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03741
Cited By
Deep reinforcement learning from human preferences
12 June 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep reinforcement learning from human preferences"
50 / 699 papers shown
Title
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
86
48
0
23 Apr 2024
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Amir Saeidi
Shivanshu Verma
Chitta Baral
Chitta Baral
ALM
43
23
0
23 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
50
30
0
16 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDa
EgoV
45
87
0
11 Apr 2024
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
71
5
0
11 Apr 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
41
18
0
08 Apr 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
93
9
0
05 Apr 2024
VLRM: Vision-Language Models act as Reward Models for Image Captioning
Maksim Dzabraev
Alexander Kunitsyn
Andrei Ivaniuta
VLM
MLLM
31
3
0
02 Apr 2024
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
Qi Gou
Cam-Tu Nguyen
35
8
0
28 Mar 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
53
122
0
28 Mar 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
47
44
0
26 Mar 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
43
0
0
19 Mar 2024
Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming Prevention
Ellie Prosser
Matthew Edwards
44
6
0
14 Mar 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
54
18
0
12 Mar 2024
A Generalized Acquisition Function for Preference-based Reward Learning
Evan Ellis
Gaurav R. Ghosal
Stuart J. Russell
Anca Dragan
Erdem Biyik
42
2
0
09 Mar 2024
A Survey on Human-AI Teaming with Large Pre-Trained Models
Vanshika Vats
Marzia Binta Nizam
Minghao Liu
Ziyuan Wang
Richard Ho
...
Celeste Shen
Rachel Shen
Nafisa Hussain
Kesav Ravichandran
James Davis
LM&MA
50
8
0
07 Mar 2024
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
Yu Zhu
Chuxiong Sun
Wenfei Yang
Wenqiang Wei
Simin Niu
...
Zhiyu Li
Shifeng Zhang
Zhiyu Li
Jie Hu
Mingchuan Yang
42
3
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
47
16
0
07 Mar 2024
Preference optimization of protein language models as a multi-objective binder design paradigm
Pouria A. Mistani
Venkatesh Mysore
45
6
0
07 Mar 2024
Should We Fear Large Language Models? A Structural Analysis of the Human Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens of Heidegger's Philosophy
Jianqiiu Zhang
ELM
40
1
0
05 Mar 2024
Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models
Dimitris Papadimitriou
Daniel S. Brown
53
1
0
04 Mar 2024
Memory-Augmented Generative Adversarial Transformers
Stephan Raaijmakers
Roos Bakker
Anita Cremers
R. D. Kleijn
Tom Kouwenhoven
Tessa Verhoef
41
0
0
29 Feb 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Haoxiang Wang
Yong Lin
Wei Xiong
Rui Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
40
72
0
28 Feb 2024
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Jie Cheng
Gang Xiong
Xingyuan Dai
Qinghai Miao
Yisheng Lv
Fei-Yue Wang
35
15
0
27 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
S. Feizi
MIALM
43
34
0
23 Feb 2024
Optimizing Language Models for Human Preferences is a Causal Inference Problem
Victoria Lin
Eli Ben-Michael
Louis-Philippe Morency
43
3
0
22 Feb 2024
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Chen Jia
44
2
0
22 Feb 2024
NLAS-multi: A Multilingual Corpus of Automatically Generated Natural Language Argumentation Schemes
Ramon Ruiz-Dolz
Joaquín Taverner
John Lawrence
Chris Reed
51
8
0
22 Feb 2024
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
Lin Gui
Yu Lei
Yuanzhao Zhai
Yehong Zhang
...
Hui Wang
Yue Yu
Kam-Fai Wong
Bin Liang
Ruifeng Xu
CLL
42
4
0
22 Feb 2024
How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction?
Aviv Brokman
Ramakanth Kavuluru
LM&MA
ALM
34
3
0
21 Feb 2024
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Archit Sharma
Sedrick Scott Keh
Eric Mitchell
Chelsea Finn
Kushal Arora
Thomas Kollar
ALM
LLMAG
29
23
0
19 Feb 2024
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Zhen Xiang
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
51
89
0
19 Feb 2024
Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM
Zijin Hong
Zheng Yuan
Hao Chen
Qinggang Zhang
Feiran Huang
Xiao Huang
41
24
0
18 Feb 2024
CoLLaVO: Crayon Large Language and Vision mOdel
Byung-Kwan Lee
Beomchan Park
Chae Won Kim
Yonghyun Ro
VLM
MLLM
44
16
0
17 Feb 2024
Operational Collective Intelligence of Humans and Machines
Nikolos Gurney
Fred Morstatter
David Pynadath
Adam Russell
Gleb Satyukov
AI4CE
30
0
0
16 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
29
17
0
14 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
377
0
09 Feb 2024
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù
Zdeněk Kasner
Siva Reddy
39
60
0
08 Feb 2024
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang
Z. Guo
Zeyu Zheng
Daniele Calandriello
Rémi Munos
Mark Rowland
Pierre Harvey Richemond
Michal Valko
Bernardo Avila-Pires
Bilal Piot
34
93
0
08 Feb 2024
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg
A. Abdolmaleki
Jingwei Zhang
Oliver Groth
Michael Bloesch
...
Sarah Bechtle
Steven Kapturowski
Roland Hafner
N. Heess
Martin Riedmiller
OffRL
LRM
35
12
0
08 Feb 2024
Toward Human-AI Alignment in Large-Scale Multi-Player Games
Sugandha Sharma
Guy Davidson
Khimya Khetarpal
Anssi Kanervisto
Udit Arora
Katja Hofmann
Ida Momennejad
35
0
0
05 Feb 2024
Gradient-Based Language Model Red Teaming
Nevan Wichers
Carson E. Denison
Ahmad Beirami
24
26
0
30 Jan 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
36
25
0
29 Jan 2024
YODA: Teacher-Student Progressive Learning for Language Models
Jianqiao Lu
Wanjun Zhong
Yufei Wang
Zhijiang Guo
Qi Zhu
...
Baojun Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
LRM
32
7
0
28 Jan 2024
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models
Minbyul Jeong
Jiwoong Sohn
Mujeen Sung
Jaewoo Kang
28
29
0
27 Jan 2024
In-IDE Human-AI Experience in the Era of Large Language Models; A Literature Review
Agnia Sergeyuk
Sergey Titov
Maliheh Izadi
44
6
0
19 Jan 2024
Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency Requirement
Holger Boche
Adalbert Fono
Gitta Kutyniok
FaML
41
4
0
18 Jan 2024
Canvil: Designerly Adaptation for LLM-Powered User Experiences
K. J. Kevin Feng
Q. V. Liao
Ziang Xiao
Jennifer Wortman Vaughan
Amy X. Zhang
David W. McDonald
59
17
0
17 Jan 2024
Crowd-PrefRL: Preference-Based Reward Learning from Crowds
David Chhan
Ellen R. Novoseller
Vernon J. Lawhern
42
5
0
17 Jan 2024
Evaluating Language Model Agency through Negotiations
Tim R. Davidson
V. Veselovsky
Martin Josifoski
Maxime Peyrard
Antoine Bosselut
Michal Kosinski
Robert West
LLMAG
39
22
0
09 Jan 2024
Previous
1
2
3
...
6
7
8
...
12
13
14
Next