ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.13900
  4. Cited By
Quantifying Differences in Reward Functions

Quantifying Differences in Reward Functions

24 June 2020
Adam Gleave
Michael Dennis
Shane Legg
Stuart J. Russell
Jan Leike
    OffRL
ArXivPDFHTML

Papers citing "Quantifying Differences in Reward Functions"

17 / 17 papers shown
Title
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
29
0
0
06 May 2025
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Xueru Wen
Jie Lou
Y. Lu
Hongyu Lin
Xing Yu
Xinyu Lu
Ben He
Xianpei Han
Debing Zhang
Le Sun
ALM
61
4
0
17 Feb 2025
Preserving the Privacy of Reward Functions in MDPs through Deception
Preserving the Privacy of Reward Functions in MDPs through Deception
Shashank Reddy Chirra
Pradeep Varakantham
P. Paruchuri
35
0
0
13 Jul 2024
A Generalized Acquisition Function for Preference-based Reward Learning
A Generalized Acquisition Function for Preference-based Reward Learning
Evan Ellis
Gaurav R. Ghosal
Stuart J. Russell
Anca Dragan
Erdem Biyik
34
1
0
09 Mar 2024
Designing Fiduciary Artificial Intelligence
Designing Fiduciary Artificial Intelligence
Sebastian Benthall
David Shekman
48
4
0
27 Jul 2023
Learning Interpretable Models of Aircraft Handling Behaviour by
  Reinforcement Learning from Human Feedback
Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback
Tom Bewley
J. Lawry
Arthur G. Richards
30
1
0
26 May 2023
On The Fragility of Learned Reward Functions
On The Fragility of Learned Reward Functions
Lev McKinney
Yawen Duan
David M. Krueger
Adam Gleave
28
20
0
09 Jan 2023
Redefining Counterfactual Explanations for Reinforcement Learning:
  Overview, Challenges and Opportunities
Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities
Jasmina Gajcin
Ivana Dusparic
CML
OffRL
35
8
0
21 Oct 2022
Symbol Guided Hindsight Priors for Reward Learning from Human
  Preferences
Symbol Guided Hindsight Priors for Reward Learning from Human Preferences
Mudit Verma
Katherine Metcalf
27
8
0
17 Oct 2022
Calculus on MDPs: Potential Shaping as a Gradient
Calculus on MDPs: Potential Shaping as a Gradient
Erik Jenner
H. V. Hoof
Adam Gleave
19
4
0
20 Aug 2022
Reward Uncertainty for Exploration in Preference-based Reinforcement
  Learning
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning
Xinran Liang
Katherine Shu
Kimin Lee
Pieter Abbeel
16
58
0
24 May 2022
A Primer on Maximum Causal Entropy Inverse Reinforcement Learning
A Primer on Maximum Causal Entropy Inverse Reinforcement Learning
Adam Gleave
Sam Toyer
21
13
0
22 Mar 2022
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Jack Parker-Holder
Raghunandan Rajan
Xingyou Song
André Biedenkapp
Yingjie Miao
...
Vu-Linh Nguyen
Roberto Calandra
Aleksandra Faust
Frank Hutter
Marius Lindauer
AI4CE
33
100
0
11 Jan 2022
Uncertain Decisions Facilitate Better Preference Learning
Uncertain Decisions Facilitate Better Preference Learning
Cassidy Laidlaw
Stuart J. Russell
30
10
0
19 Jun 2021
Inverse Reinforcement Learning with Explicit Policy Estimates
Inverse Reinforcement Learning with Explicit Policy Estimates
Navyata Sanghvi
Shinnosuke Usami
Mohit Sharma
J. Groeger
Kris M. Kitani
CML
23
6
0
04 Mar 2021
Understanding Learned Reward Functions
Understanding Learned Reward Functions
Eric J. Michaud
Adam Gleave
Stuart J. Russell
XAI
OffRL
22
33
0
10 Dec 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,587
0
18 Sep 2019
1