Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.02827
Cited By
Inverse Reward Design
8 November 2017
Dylan Hadfield-Menell
S. Milli
Pieter Abbeel
Stuart J. Russell
Anca Dragan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inverse Reward Design"
50 / 97 papers shown
Title
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Feifei Zhao
Yufei Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
83
0
0
24 Apr 2025
Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning
Younghwan Lee
Tung M. Luu
Donghoon Lee
Chang D. Yoo
3DV
VLM
OffRL
43
0
0
03 Apr 2025
Reward Training Wheels: Adaptive Auxiliary Rewards for Robotics Reinforcement Learning
Linji Wang
Tong Xu
Yuanjie Lu
Xuesu Xiao
54
0
0
19 Mar 2025
Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm
Haksub Kim
Kanghoon Lee
J. Park
Jiachen Li
Jinkyoo Park
62
1
0
05 Mar 2025
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stañczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Böttinger
...
Timothy P. Lillicrap
Ana Marasović
Sylvie Delacroix
Gillian K. Hadfield
Siva Reddy
227
0
0
27 Feb 2025
Your Learned Constraint is Secretly a Backward Reachable Tube
Mohamad Qadri
Gokul Swamy
Jonathan Francis
Michael Kaess
Andrea Bajcsy
35
3
0
26 Jan 2025
Evolution and The Knightian Blindspot of Machine Learning
Joel Lehman
Elliot Meyerson
Tarek El-Gaaly
Kenneth O. Stanley
Tarin Ziyaee
96
2
0
22 Jan 2025
Learning to Assist Humans without Inferring Rewards
Vivek Myers
Evan Ellis
Sergey Levine
Benjamin Eysenbach
Anca Dragan
48
3
0
17 Jan 2025
Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation
N. Dennler
Stefanos Nikolaidis
Maja J. Matarić
218
0
0
03 Jan 2025
Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications
Sinan Ibrahim
Mostafa Mostafa
Ali Jnadi
Hadi Salloum
Pavel Osinenko
OffRL
52
14
0
31 Dec 2024
Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences
Z. Liu
Junjie Xu
Xingjiao Wu
J. Yang
Liang He
26
0
0
11 Sep 2024
Preference-Guided Reinforcement Learning for Efficient Exploration
Guojian Wang
Faguo Wu
Xiao Zhang
Tianyuan Chen
Xuyang Chen
Lin Zhao
45
0
0
09 Jul 2024
Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
Yuwei Zeng
Yao Mu
Lin Shao
42
12
0
12 May 2024
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Calarina Muslimani
Matthew E. Taylor
OffRL
46
2
0
30 Apr 2024
Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
M. Beliaev
Ramtin Pedarsani
41
2
0
02 Feb 2024
Cross Fertilizing Empathy from Brain to Machine as a Value Alignment Strategy
Devin Gonier
Adrian Adduci
Cassidy LoCascio
34
0
0
10 Dec 2023
Increasing Transparency of Reinforcement Learning using Shielding for Human Preferences and Explanations
Georgios Angelopoulos
Luigi Mangiacapra
Alessandra Rossi
C. Napoli
Silvia Rossi
27
1
0
28 Nov 2023
Learning Reward for Physical Skills using Large Language Model
Yuwei Zeng
Yiqing Xu
36
6
0
21 Oct 2023
Designing Fiduciary Artificial Intelligence
Sebastian Benthall
David Shekman
51
4
0
27 Jul 2023
Reward Design with Language Models
Minae Kwon
Sang Michael Xie
Kalesha Bullard
Dorsa Sadigh
LM&Ro
44
202
0
27 Feb 2023
Active Reward Learning from Online Preferences
Vivek Myers
Erdem Biyik
Dorsa Sadigh
OffRL
37
12
0
27 Feb 2023
Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws
Kush S. Bhatia
Wenshuo Guo
Jacob Steinhardt
27
0
0
23 Feb 2023
Machine Love
Joel Lehman
28
5
0
18 Feb 2023
A State Augmentation based approach to Reinforcement Learning from Human Preferences
Mudit Verma
Subbarao Kambhampati
35
2
0
17 Feb 2023
Goal Alignment: A Human-Aware Account of Value Alignment Problem
Malek Mechergui
S. Sreedharan
20
2
0
02 Feb 2023
Few-Shot Preference Learning for Human-in-the-Loop RL
Joey Hejna
Dorsa Sadigh
OffRL
32
92
0
06 Dec 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
27
6
0
15 Nov 2022
Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
Katherine Metcalf
Miguel Sarabia
B. Theobald
OffRL
38
4
0
12 Nov 2022
Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion
Utkarsh Soni
Nupur Thakur
S. Sreedharan
L. Guan
Mudit Verma
Matthew Marquez
Subbarao Kambhampati
41
6
0
27 Oct 2022
Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation
Chenning Yu
Hong-Den Yu
Sicun Gao
42
17
0
17 Oct 2022
Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning
Jifeng Hu
Yanchao Sun
Hechang Chen
Sili Huang
Haiyin Piao
Yi-Ju Chang
Lichao Sun
23
5
0
14 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
40
68
0
04 Oct 2022
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Konstantina Christakopoulou
Can Xu
Sai Zhang
Sriraj Badam
Trevor Potter
...
Ya Le
Chris Berg
E. B. Dixon
Ed H. Chi
Minmin Chen
OffRL
25
8
0
30 Sep 2022
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
59
56
0
27 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
88
27
0
14 Sep 2022
Task-Agnostic Learning to Accomplish New Tasks
Xianqi Zhang
Xingtao Wang
Xu Liu
Wenrui Wang
Xiaopeng Fan
Debin Zhao
OffRL
91
0
0
09 Sep 2022
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark Schmidt
OffRL
23
6
0
29 Jul 2022
How to talk so AI will learn: Instructions, descriptions, and autonomy
T. Sumers
Robert D. Hawkins
Mark K. Ho
Thomas Griffiths
Dylan Hadfield-Menell
LM&Ro
38
20
0
16 Jun 2022
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning
Xinran Liang
Katherine Shu
Kimin Lee
Pieter Abbeel
21
58
0
24 May 2022
Aligning Robot Representations with Humans
Andreea Bobu
Andi Peng
27
0
0
15 May 2022
Correcting Robot Plans with Natural Language Feedback
Pratyusha Sharma
Balakumar Sundaralingam
Valts Blukis
Chris Paxton
Tucker Hermans
Antonio Torralba
Jacob Andreas
Dieter Fox
3DV
LM&Ro
23
92
0
11 Apr 2022
Inferring Rewards from Language in Context
Jessy Lin
Daniel Fried
Dan Klein
Anca Dragan
LM&Ro
29
54
0
05 Apr 2022
The dangers in algorithms learning humans' values and irrationalities
Rebecca Gormann
Stuart Armstrong
20
2
0
28 Feb 2022
Inducing Structure in Reward Learning by Learning Features
Andreea Bobu
Marius Wiggert
Claire Tomlin
Anca Dragan
27
30
0
18 Jan 2022
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
53
172
0
10 Jan 2022
Programmatic Reward Design by Example
Weichao Zhou
Wenchao Li
34
15
0
14 Dec 2021
Learning Perceptual Concepts by Bootstrapping from Human Queries
Andreea Bobu
Chris Paxton
Wei Yang
Balakumar Sundaralingam
Yu-Wei Chao
Maya Cakmak
Dieter Fox
SSL
33
17
0
09 Nov 2021
B-Pref: Benchmarking Preference-Based Reinforcement Learning
Kimin Lee
Laura M. Smith
Anca Dragan
Pieter Abbeel
OffRL
40
93
0
04 Nov 2021
On the Expressivity of Markov Reward
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Doina Precup
Satinder Singh
29
82
0
01 Nov 2021
Medical Dead-ends and Learning to Identify High-risk States and Treatments
Mehdi Fatemi
Taylor W. Killian
J. Subramanian
Marzyeh Ghassemi
OffRL
33
37
0
08 Oct 2021
1
2
Next