ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.13553
  4. Cited By
Preprocessing Reward Functions for Interpretability

Preprocessing Reward Functions for Interpretability

25 March 2022
Erik Jenner
Adam Gleave
ArXivPDFHTML

Papers citing "Preprocessing Reward Functions for Interpretability"

5 / 5 papers shown
Title
Explaining Learned Reward Functions with Counterfactual Trajectories
Explaining Learned Reward Functions with Counterfactual Trajectories
Jan Wehner
Frans Oliehoek
Luciano Cavalcante Siebert
29
0
0
07 Feb 2024
Learning Interpretable Models of Aircraft Handling Behaviour by
  Reinforcement Learning from Human Feedback
Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback
Tom Bewley
J. Lawry
Arthur G. Richards
30
1
0
26 May 2023
Reward Learning with Trees: Methods and Evaluation
Reward Learning with Trees: Methods and Evaluation
Tom Bewley
J. Lawry
Arthur G. Richards
R. Craddock
Ian Henderson
23
1
0
03 Oct 2022
Calculus on MDPs: Potential Shaping as a Gradient
Calculus on MDPs: Potential Shaping as a Gradient
Erik Jenner
H. V. Hoof
Adam Gleave
22
4
0
20 Aug 2022
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,595
0
18 Sep 2019
1