Self-Modification of Policy and Utility Function in Rational Agents

Self-Modification of Policy and Utility Function in Rational Agents

10 May 2016

Marcus Hutter

ArXiv (abs)PDF HTML

Papers citing "Self-Modification of Policy and Utility Function in Rational Agents"

11 / 11 papers shown

Title
Reward Shaping to Mitigate Reward Hacking in RLHF Jiayi Fu Xuandong Zhao Chengyuan Yao Han Wang Qi Han Yanghua Xiao 202 14 0 26 Feb 2025
Towards shutdownable agents via stochastic choice Elliott Thornley Alexander Roman Christos Ziakas Leyton Ho Louis Thomson 140 0 0 30 Jun 2024
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective Tom Everitt Marcus Hutter Ramana Kumar Victoria Krakovna 105 97 0 13 Aug 2019
Corrigibility with Utility Preservation K. Holtman KELM 45 9 0 05 Aug 2019
Modeling AGI Safety Frameworks with Causal Influence Diagrams Tom Everitt Ramana Kumar Victoria Krakovna Shane Legg AI4CE 67 22 0 20 Jun 2019
AGI Safety Literature Review Tom Everitt G. Lea Marcus Hutter AI4CE 86 116 0 03 May 2018
Índifference' methods for managing agent rewards Stuart Armstrong Xavier O'Rourke 89 19 0 18 Dec 2017
AI Safety Gridworlds Jan Leike Miljan Martic Victoria Krakovna Pedro A. Ortega Tom Everitt Andrew Lefrancq Laurent Orseau Shane Legg 158 255 0 27 Nov 2017
Nonparametric General Reinforcement Learning Jan Leike OffRL 105 26 0 28 Nov 2016
Concrete Problems in AI Safety Dario Amodei C. Olah Jacob Steinhardt Paul Christiano John Schulman Dandelion Mané 315 2,407 0 21 Jun 2016
Avoiding Wireheading with Value Reinforcement Learning Tom Everitt Marcus Hutter AI4CE 129 44 0 10 May 2016