Avoiding Wireheading with Value Reinforcement Learning

Avoiding Wireheading with Value Reinforcement Learning

10 May 2016

Marcus Hutter

ArXiv (abs)PDF HTML

Papers citing "Avoiding Wireheading with Value Reinforcement Learning"

14 / 14 papers shown

Title
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking Sebastian Farquhar Vikrant Varma David Lindner David Elson Caleb Biddulph Ian Goodfellow Rohin Shah 178 2 0 22 Jan 2025
Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning Paul Rolland Luca Viano Norman Schuerhoff Boris Nikolov Volkan Cevher OffRL 81 14 0 22 Sep 2022
Morality, Machines and the Interpretation Problem: A Value-based, Wittgensteinian Approach to Building Moral Agents C. Badea Gregory Artus 106 9 0 03 Mar 2021
REALab: An Embedded Perspective on Tampering Ramana Kumar J. Uesato Richard Ngo Tom Everitt Victoria Krakovna Shane Legg 72 10 0 17 Nov 2020
Positive-Unlabeled Reward Learning Danfei Xu Misha Denil 82 38 0 01 Nov 2019
Rethinking Formal Models of Partially Observable Multiagent Decision Making Vojtěch Kovařík Martin Schmid Neil Burch Michael Bowling Viliam Lisý OffRL 144 56 0 26 Jun 2019
Categorizing Wireheading in Partially Embedded Agents Arushi G. K. Majha Sayan Sarkar Davide Zagami 36 3 0 21 Jun 2019
Imitation Learning from Imperfect Demonstration Yueh-hua Wu Nontawat Charoenphakdee Han Bao Voot Tangkaratt Masashi Sugiyama 73 162 0 27 Jan 2019
Emergence of Addictive Behaviors in Reinforcement Learning Agents Vahid Behzadan Roman V. Yampolskiy Arslan Munir 30 5 0 14 Nov 2018
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities Joel Lehman Jeff Clune D. Misevic C. Adami L. Altenberg ... Danesh Tarapore S. Thibault Westley Weimer R. Watson Jason Yosinksi 177 282 0 09 Mar 2018
Occam's razor is insufficient to infer the preferences of irrational agents Stuart Armstrong Sören Mindermann 102 93 0 15 Dec 2017
Nonparametric General Reinforcement Learning Jan Leike OffRL 105 26 0 28 Nov 2016
Concrete Problems in AI Safety Dario Amodei C. Olah Jacob Steinhardt Paul Christiano John Schulman Dandelion Mané 315 2,407 0 21 Jun 2016
Self-Modification of Policy and Utility Function in Rational Agents Tom Everitt Daniel Filan Mayank Daswani Marcus Hutter 77 29 0 10 May 2016