Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1605.03142
Cited By
Self-Modification of Policy and Utility Function in Rational Agents
10 May 2016
Tom Everitt
Daniel Filan
Mayank Daswani
Marcus Hutter
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Modification of Policy and Utility Function in Rational Agents"
11 / 11 papers shown
Title
Reward Shaping to Mitigate Reward Hacking in RLHF
Jiayi Fu
Xuandong Zhao
Chengyuan Yao
Han Wang
Qi Han
Yanghua Xiao
202
14
0
26 Feb 2025
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
140
0
0
30 Jun 2024
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Tom Everitt
Marcus Hutter
Ramana Kumar
Victoria Krakovna
105
97
0
13 Aug 2019
Corrigibility with Utility Preservation
K. Holtman
KELM
45
9
0
05 Aug 2019
Modeling AGI Safety Frameworks with Causal Influence Diagrams
Tom Everitt
Ramana Kumar
Victoria Krakovna
Shane Legg
AI4CE
67
22
0
20 Jun 2019
AGI Safety Literature Review
Tom Everitt
G. Lea
Marcus Hutter
AI4CE
86
116
0
03 May 2018
Índifference' methods for managing agent rewards
Stuart Armstrong
Xavier O'Rourke
89
19
0
18 Dec 2017
AI Safety Gridworlds
Jan Leike
Miljan Martic
Victoria Krakovna
Pedro A. Ortega
Tom Everitt
Andrew Lefrancq
Laurent Orseau
Shane Legg
158
255
0
27 Nov 2017
Nonparametric General Reinforcement Learning
Jan Leike
OffRL
105
26
0
28 Nov 2016
Concrete Problems in AI Safety
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
315
2,407
0
21 Jun 2016
Avoiding Wireheading with Value Reinforcement Learning
Tom Everitt
Marcus Hutter
AI4CE
129
44
0
10 May 2016
1