Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.13553
Cited By
Preprocessing Reward Functions for Interpretability
25 March 2022
Erik Jenner
Adam Gleave
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Preprocessing Reward Functions for Interpretability"
5 / 5 papers shown
Title
Explaining Learned Reward Functions with Counterfactual Trajectories
Jan Wehner
Frans Oliehoek
Luciano Cavalcante Siebert
29
0
0
07 Feb 2024
Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback
Tom Bewley
J. Lawry
Arthur G. Richards
30
1
0
26 May 2023
Reward Learning with Trees: Methods and Evaluation
Tom Bewley
J. Lawry
Arthur G. Richards
R. Craddock
Ian Henderson
23
1
0
03 Oct 2022
Calculus on MDPs: Potential Shaping as a Gradient
Erik Jenner
H. V. Hoof
Adam Gleave
22
4
0
20 Aug 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,595
0
18 Sep 2019
1