Reward-rational (implicit) choice: A unifying formalism for reward learning

12 February 2020

Papers citing "Reward-rational (implicit) choice: A unifying formalism for reward learning"

30 / 30 papers shown

Title
Optimal Interactive Learning on the Job via Facility Location Planning Shivam Vats Michelle Zhao Patrick Callaghan Mingxi Jia Maxim Likhachev Oliver Kroemer George Konidaris 29 0 0 01 May 2025
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback Nan Lu Ethan X. Fang Junwei Lu 102 0 0 27 Apr 2025
Preference-Guided Reinforcement Learning for Efficient Exploration Guojian Wang Faguo Wu Xiao Zhang Tianyuan Chen Xuyang Chen Lin Zhao 38 0 0 09 Jul 2024
Pareto-Optimal Learning from Preferences with Hidden Context Ryan Boldi Li Ding Lee Spector S. Niekum 62 6 0 21 Jun 2024
A Generalized Acquisition Function for Preference-based Reward Learning Evan Ellis Gaurav R. Ghosal Stuart J. Russell Anca Dragan Erdem Biyik 29 1 0 09 Mar 2024
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks Stephanie Milani Anssi Kanervisto Karolis Ramanauskas Sander Schulhoff Brandon Houghton Rohin Shah 21 6 0 05 Dec 2023
Designing Fiduciary Artificial Intelligence Sebastian Benthall David Shekman 43 3 0 27 Jul 2023
Decision-Oriented Dialogue for Human-AI Collaboration Jessy Lin Nicholas Tomlin Jacob Andreas J. Eisner LLMAG 18 26 0 31 May 2023
Benchmarks and Algorithms for Offline Preference-Based Reward Learning Daniel Shin Anca Dragan Daniel S. Brown OffRL 11 53 0 03 Jan 2023
SIRL: Similarity-based Implicit Representation Learning Andreea Bobu Yi Liu Rohin Shah Daniel S. Brown Anca Dragan SSL DRL 14 17 0 02 Jan 2023
Learning Latent Representations to Co-Adapt to Humans Sagar Parekh Dylan P. Losey 18 12 0 19 Dec 2022
Tight Performance Guarantees of Imitator Policies with Continuous Actions Davide Maran Alberto Maria Metelli Marcello Restelli OffRL 15 4 0 07 Dec 2022
RISO: Combining Rigid Grippers with Soft Switchable Adhesives Shaunak A. Mehta Yeunhee Kim Joshua Hoegerman Michael D. Bartlett Dylan P. Losey 11 4 0 27 Oct 2022
Environment Design for Inverse Reinforcement Learning Thomas Kleine Buening Victor Villin Christos Dimitrakakis 30 1 0 26 Oct 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans John J. Nay ELM AILaw 84 27 0 14 Sep 2022
Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning David Lindner Mennatallah El-Assady OffRL 25 16 0 27 Jun 2022
How to talk so AI will learn: Instructions, descriptions, and autonomy T. Sumers Robert D. Hawkins Mark K. Ho Thomas L. Griffiths Dylan Hadfield-Menell LM&Ro 26 20 0 16 Jun 2022
Self-critiquing models for assisting human evaluators William Saunders Catherine Yeh Jeff Wu Steven Bills Ouyang Long Jonathan Ward Jan Leike ALM ELM 21 279 0 12 Jun 2022
Encouraging Human Interaction with Robot Teams: Legible and Fair Subtask Allocations Soheil Habibian Dylan P. Losey 11 10 0 06 May 2022
Inferring Rewards from Language in Context Jessy Lin Daniel Fried Dan Klein Anca Dragan LM&Ro 19 54 0 05 Apr 2022
A Primer on Maximum Causal Entropy Inverse Reinforcement Learning Adam Gleave Sam Toyer 13 13 0 22 Mar 2022
A Ranking Game for Imitation Learning Harshit S. Sikchi Akanksha Saran Wonjoon Goo S. Niekum OffRL 17 22 0 07 Feb 2022
Safe Deep RL in 3D Environments using Human Feedback Matthew Rahtz Vikrant Varma Ramana Kumar Zachary Kenton Shane Legg Jan Leike 24 4 0 20 Jan 2022
On the Expressivity of Markov Reward David Abel Will Dabney A. Harutyunyan Mark K. Ho Michael L. Littman Doina Precup Satinder Singh 13 82 0 01 Nov 2021
Risk Averse Bayesian Reward Learning for Autonomous Navigation from Human Demonstration Christian Ellis Maggie B. Wigness J. Rogers Craig T. Lennon L. Fiondella 65 6 0 31 Jul 2021
The MineRL BASALT Competition on Learning from Human Feedback Rohin Shah Cody Wild Steven H. Wang Neel Alex Brandon Houghton ... Stephanie Milani Nicholay Topin Pieter Abbeel Stuart J. Russell Anca Dragan 22 30 0 05 Jul 2021
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits Wenshuo Guo Kumar Krishna Agrawal Aditya Grover Vidya Muthukumar A. Pananjady 11 8 0 28 Jun 2021
Uncertain Decisions Facilitate Better Preference Learning Cassidy Laidlaw Stuart J. Russell 25 10 0 19 Jun 2021
Open Problems in Cooperative AI Allan Dafoe Edward Hughes Yoram Bachrach Tantum Collins Kevin R. McKee Joel Z. Leibo Kate Larson T. Graepel 11 199 0 15 Dec 2020
Speaker-Follower Models for Vision-and-Language Navigation Daniel Fried Ronghang Hu Volkan Cirik Anna Rohrbach Jacob Andreas Louis-Philippe Morency Taylor Berg-Kirkpatrick Kate Saenko Dan Klein Trevor Darrell LM&Ro LRM 246 496 0 07 Jun 2018