v1v2 (latest)

Inverse Reward Design

8 November 2017

Dylan Hadfield-Menell

Pieter Abbeel

Papers citing "Inverse Reward Design"

50 / 265 papers shown

Dataset Poisoning Attacks on Behavioral Cloning Policies

247

26 Nov 2025

Learning Where, What and How to Transfer: A Multi-Role Reinforcement Learning Approach for Evolutionary Multitasking

208

19 Nov 2025

PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning

307

14 Nov 2025

Large Language Models Develop Novel Social Biases Through Adaptive Exploration

196

08 Nov 2025

Restoring Noisy Demonstration for Imitation Learning With Diffusion ModelsIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025

132

16 Oct 2025

Training LLM Agents to Empower Humans

194

15 Oct 2025

Repairing Reward Functions with Feedback to Mitigate Reward Hacking

Stephane Hatgis-Kessell

Logan Mondal Bhamidipaty

Emma Brunskill

125

14 Oct 2025

Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement Learning

101

09 Oct 2025

Learning from Failures: Understanding LLM Alignment through Failure-Aware Inverse RL

101

07 Oct 2025

Failure Modes of Maximum Entropy RLHF

Ömer Veysel Çağatan

Barış Akgün

123

24 Sep 2025

Self-Supervised Goal-Reaching Results in Multi-Agent Cooperation and Exploration

165

12 Sep 2025

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

...

193

11 Sep 2025

Symmetry-Guided Multi-Agent Inverse Reinforcement Learning

165

10 Sep 2025

Text2Touch: Tactile In-Hand Manipulation with LLM-Designed Reward Functions

09 Sep 2025

An Economy of AI Agents

Gillian K. Hadfield

Andrew Koh

204

01 Sep 2025

GPLight+: A Genetic Programming Method for Learning Symmetric Traffic Signal Control PolicyIEEE Transactions on Evolutionary Computation (IEEE Trans. Evol. Comput.), 2025

Xiao-Cheng Liao

Yi Mei

Mengjie Zhang

102

22 Aug 2025

Learning from Preferences and Mixed Demonstrations in General Settings

Jason Brown

Carl Henrik Ek

Robert D. Mullins

135

19 Aug 2025

Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction

119

06 Aug 2025

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

180

31 Jul 2025

Inference-Time Reward Hacking in Large Language Models

243

24 Jun 2025

^2

: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

Brahim Driss

Alex Davey

Riad Akrour

201

16 Jun 2025

Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

Andreas Schlaginhaufen

Reda Ouhamma

Maryam Kamgarpour

243

11 Jun 2025

Provable Reinforcement Learning from Human Feedback with an Unknown Link Function

Qining Zhang

Lei Ying

274

03 Jun 2025

Apprenticeship learning with prior beliefs using inverse optimization

Mauricio Junca

Esteban Leiva

202

27 May 2025

Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

Kalyan Cherukuri

Aarav Lala

234

17 May 2025

Super Co-alignment of Human and AI for Sustainable Symbiotic Society

...

622

24 Apr 2025

FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward FunctionsIEEE International Conference on Robotics and Automation (ICRA), 2025

403

14 Apr 2025

Reward Generation via Large Vision-Language Model in Offline Reinforcement LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

321

03 Apr 2025

Reward Training Wheels: Adaptive Auxiliary Rewards for Robotics Reinforcement Learning

313

19 Mar 2025

Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm

428

05 Mar 2025

Societal Alignment Frameworks Can Improve LLM Alignment

...

1.0K

27 Feb 2025

Your Learned Constraint is Secretly a Backward Reachable Tube

435

26 Jan 2025

Evolution and The Knightian Blindspot of Machine Learning

342

22 Jan 2025

Learning to Assist Humans without Inferring RewardsNeural Information Processing Systems (NeurIPS), 2024

596

17 Jan 2025

Robustness in the Face of Partial Identifiability in Reward Learning

Filippo Lazzati

Alberto Maria Metelli

234

10 Jan 2025

Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference ElicitationIEEE/ACM International Conference on Human-Robot Interaction (HRI), 2025

N. Dennler

Stefanos Nikolaidis

Maja J. Matarić

930

03 Jan 2025

Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning ApplicationsIEEE Access (IEEE Access), 2024

313

31 Dec 2024

Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker

262

31 Dec 2024

LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency

338

30 Dec 2024

Active Inference and Human--Computer Interaction

189

19 Dec 2024

PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement

386

26 Nov 2024

Robot See, Robot Do: Imitation Reward for Noisy Financial EnvironmentsBigData Congress [Services Society] (BSS), 2024

212

13 Nov 2024

Rethinking Inverse Reinforcement Learning: from Data Alignment to Task AlignmentNeural Information Processing Systems (NeurIPS), 2024

Weichao Zhou

Wenchao Li

249

31 Oct 2024

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement LearningKnowledge-Based Systems (KBS), 2024

Xiu Li

247

18 Oct 2024

Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards

Changjie Fan

349

08 Oct 2024

Adaptive Language-Guided Abstraction from Contrastive ExplanationsConference on Robot Learning (CoRL), 2024

Andi Peng

Belinda Z. Li

Ilia Sucholutsky

Nishanth Kumar

Julie A. Shah

Jacob Andreas

Andreea Bobu

OffRL

229

12 Sep 2024

Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal PreferencesIEEE International Conference on Robotics and Automation (ICRA), 2024

Liang He

314

11 Sep 2024

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or SubgoalsInternational Conference on Learning Representations (ICLR), 2024

401

11 Aug 2024

Preference-Guided Reinforcement Learning for Efficient Exploration

278

09 Jul 2024

Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment

310

06 Jun 2024