Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.08827
Cited By
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
17 November 2020
J. Uesato
Ramana Kumar
Victoria Krakovna
Tom Everitt
Richard Ngo
Shane Legg
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Avoiding Tampering Incentives in Deep RL via Decoupled Approval"
5 / 5 papers shown
Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
82
1
0
22 Jan 2025
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
57
54
0
27 Sep 2022
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
199
199
0
02 May 2018
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
278
11,677
0
09 Mar 2017
1