ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03741
  4. Cited By
Deep reinforcement learning from human preferences

Deep reinforcement learning from human preferences

12 June 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
ArXivPDFHTML

Papers citing "Deep reinforcement learning from human preferences"

50 / 701 papers shown
Title
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
38
509
0
07 Mar 2023
Navigates Like Me: Understanding How People Evaluate Human-Like AI in
  Video Games
Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video Games
Stephanie Milani
Arthur Juliani
Ida Momennejad
Raluca Georgescu
Jaroslaw Rzepecki
Alison Shaw
Gavin Costello
Fei Fang
Sam Devlin
Katja Hofmann
39
10
0
02 Mar 2023
Preference Transformer: Modeling Human Preferences using Transformers
  for RL
Preference Transformer: Modeling Human Preferences using Transformers for RL
Changyeon Kim
Jongjin Park
Jinwoo Shin
Honglak Lee
Pieter Abbeel
Kimin Lee
OffRL
43
62
0
02 Mar 2023
How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language
  Understanding Tasks
How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks
Xuanting Chen
Junjie Ye
Can Zu
Nuo Xu
Rui Zheng
Minlong Peng
Jie Zhou
Tao Gui
Qi Zhang
Xuanjing Huang
AI4MH
ELM
38
79
0
01 Mar 2023
Reward Design with Language Models
Reward Design with Language Models
Minae Kwon
Sang Michael Xie
Kalesha Bullard
Dorsa Sadigh
LM&Ro
44
202
0
27 Feb 2023
Active Reward Learning from Online Preferences
Active Reward Learning from Online Preferences
Vivek Myers
Erdem Biyik
Dorsa Sadigh
OffRL
37
12
0
27 Feb 2023
Diffusion Model-Augmented Behavioral Cloning
Diffusion Model-Augmented Behavioral Cloning
Shangcheng Chen
Hsiang-Chun Wang
Ming-Hao Hsu
Chun-Mao Lai
Shao-Hua Sun
DiffM
65
31
0
26 Feb 2023
Reward Learning as Doubly Nonparametric Bandits: Optimal Design and
  Scaling Laws
Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws
Kush S. Bhatia
Wenshuo Guo
Jacob Steinhardt
27
0
0
23 Feb 2023
Machine Love
Machine Love
Joel Lehman
28
5
0
18 Feb 2023
Exploiting Unlabeled Data for Feedback Efficient Human Preference based
  Reinforcement Learning
Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning
Mudit Verma
Siddhant Bhambri
Subbarao Kambhampati
41
4
0
17 Feb 2023
A State Augmentation based approach to Reinforcement Learning from Human
  Preferences
A State Augmentation based approach to Reinforcement Learning from Human Preferences
Mudit Verma
Subbarao Kambhampati
35
2
0
17 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
55
196
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
45
159
0
15 Feb 2023
Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning
Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning
Yunke Wang
Bo Du
Chang Xu
40
8
0
13 Feb 2023
COACH: Cooperative Robot Teaching
COACH: Cooperative Robot Teaching
Cunjun Yu
Yiqing Xu
Linfeng Li
David Hsu
34
5
0
13 Feb 2023
Synthesizing Human Gaze Feedback for Improved NLP Performance
Synthesizing Human Gaze Feedback for Improved NLP Performance
Varun Khurana
Yaman Kumar Singla
Nora Hollenstein
R. Kumar
Balaji Krishnamurthy
18
15
0
11 Feb 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
184
0
26 Jan 2023
ASQ-IT: Interactive Explanations for Reinforcement-Learning Agents
ASQ-IT: Interactive Explanations for Reinforcement-Learning Agents
Yotam Amitai
Guy Avni
Ofra Amir
47
3
0
24 Jan 2023
Dissociating language and thought in large language models
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELM
ReLM
34
209
0
16 Jan 2023
On The Fragility of Learned Reward Functions
On The Fragility of Learned Reward Functions
Lev McKinney
Yawen Duan
David M. Krueger
Adam Gleave
33
20
0
09 Jan 2023
"No, to the Right" -- Online Language Corrections for Robotic
  Manipulation via Shared Autonomy
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy
Yuchen Cui
Siddharth Karamcheti
Raj Palleti
Nidhya Shivakumar
Percy Liang
Dorsa Sadigh
LM&Ro
48
76
0
06 Jan 2023
Iterated Decomposition: Improving Science Q&A by Supervising Reasoning
  Processes
Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes
Justin Reppert
Ben Rachbach
Charlie George
Luke Stebbing
Ju-Seung Byun
Maggie Appleton
Andreas Stuhlmuller
ReLM
LRM
50
17
0
04 Jan 2023
Benchmarks and Algorithms for Offline Preference-Based Reward Learning
Benchmarks and Algorithms for Offline Preference-Based Reward Learning
Daniel Shin
Anca Dragan
Daniel S. Brown
OffRL
22
53
0
03 Jan 2023
Genetic Imitation Learning by Reward Extrapolation
Genetic Imitation Learning by Reward Extrapolation
Boyuan Zheng
Jianlong Zhou
Fang Chen
19
0
0
03 Jan 2023
SIRL: Similarity-based Implicit Representation Learning
SIRL: Similarity-based Implicit Representation Learning
Andreea Bobu
Yi Liu
Rohin Shah
Daniel S. Brown
Anca Dragan
SSL
DRL
40
17
0
02 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
104
35
0
01 Jan 2023
Towards automating Codenames spymasters with deep reinforcement learning
Towards automating Codenames spymasters with deep reinforcement learning
Sherman Siu
28
2
0
28 Dec 2022
Inclusive Artificial Intelligence
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
52
1
0
24 Dec 2022
OPT-IML: Scaling Language Model Instruction Meta Learning through the
  Lens of Generalization
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Srinivasan Iyer
Xi Lin
Ramakanth Pasunuru
Todor Mihaylov
Daniel Simig
...
Jeff Wang
Christopher Dewan
Asli Celikyilmaz
Luke Zettlemoyer
Veselin Stoyanov
ALM
46
261
0
22 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
367
0
19 Dec 2022
Training Robots to Evaluate Robots: Example-Based Interactive Reward
  Functions for Policy Learning
Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning
Kun-Yen Huang
E. Hu
Dinesh Jayaraman
OffRL
43
5
0
17 Dec 2022
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
121
1,495
0
15 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
73
331
0
07 Dec 2022
Few-Shot Preference Learning for Human-in-the-Loop RL
Few-Shot Preference Learning for Human-in-the-Loop RL
Joey Hejna
Dorsa Sadigh
OffRL
37
92
0
06 Dec 2022
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
David Zhang
Micah Carroll
Andreea Bobu
Anca Dragan
32
4
0
30 Nov 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
49
215
0
28 Nov 2022
Actively Learning Costly Reward Functions for Reinforcement Learning
Actively Learning Costly Reward Functions for Reinforcement Learning
André Eberhard
Houssam Metni
G. Fahland
A. Stroh
Pascal Friederich
OffRL
43
0
0
23 Nov 2022
imitation: Clean Imitation Learning Implementations
imitation: Clean Imitation Learning Implementations
Adam Gleave
Mohammad Taufeeque
Juan Rocamonde
Erik Jenner
Steven H. Wang
Sam Toyer
M. Ernestus
Nora Belrose
Scott Emmons
Stuart J. Russell
MLAU
24
30
0
22 Nov 2022
Rewards Encoding Environment Dynamics Improves Preference-based
  Reinforcement Learning
Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
Katherine Metcalf
Miguel Sarabia
B. Theobald
OffRL
38
4
0
12 Nov 2022
The CRINGE Loss: Learning what language not to model
The CRINGE Loss: Learning what language not to model
Leonard Adolphs
Tianyu Gao
Jing Xu
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
MU
31
35
0
10 Nov 2022
Zero-shot Visual Commonsense Immorality Prediction
Zero-shot Visual Commonsense Immorality Prediction
Yujin Jeong
Seongbeom Park
Suhong Moon
Jinkyu Kim
VLM
27
1
0
10 Nov 2022
Measuring Progress on Scalable Oversight for Large Language Models
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALM
ELM
28
123
0
04 Nov 2022
LMentry: A Language Model Benchmark of Elementary Language Tasks
LMentry: A Language Model Benchmark of Elementary Language Tasks
Avia Efrat
Or Honovich
Omer Levy
34
20
0
03 Nov 2022
TaTa: A Multilingual Table-to-Text Dataset for African Languages
TaTa: A Multilingual Table-to-Text Dataset for African Languages
Sebastian Gehrmann
Sebastian Ruder
Vitaly Nikolaev
Jan A. Botha
Michael Chavinda
Ankur P. Parikh
Clara E. Rivera
LMTD
42
10
0
31 Oct 2022
Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for
  Industrial Insertion of Novel Connectors from Vision
Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision
Ashvin Nair
Brian Zhu
Gokul Narayanan
Eugen Solowjow
Sergey Levine
OffRL
OnRL
33
15
0
27 Oct 2022
Towards customizable reinforcement learning agents: Enabling preference
  specification through online vocabulary expansion
Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion
Utkarsh Soni
Nupur Thakur
S. Sreedharan
L. Guan
Mudit Verma
Matthew Marquez
Subbarao Kambhampati
44
6
0
27 Oct 2022
Reinforcement Learning and Bandits for Speech and Language Processing:
  Tutorial, Review and Outlook
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook
Baihan Lin
OffRL
AI4TS
37
27
0
24 Oct 2022
Safe Policy Improvement in Constrained Markov Decision Processes
Safe Policy Improvement in Constrained Markov Decision Processes
Luigi Berducci
Radu Grosu
OffRL
36
2
0
20 Oct 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
493
0
19 Oct 2022
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Luke Vilnis
Yury Zemlyanskiy
Patrick C. Murray
Alexandre Passos
Sumit Sanghai
62
9
0
18 Oct 2022
Previous
123...101112131415
Next