Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.10018
Cited By
Path-Specific Objectives for Safer Agent Incentives
21 April 2022
Sebastian Farquhar
Ryan Carey
Tom Everitt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Path-Specific Objectives for Safer Agent Incentives"
6 / 6 papers shown
Title
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
88
1
0
22 Jan 2025
On Imperfect Recall in Multi-Agent Influence Diagrams
James Fox
Matt MacDermott
Lewis Hammond
Paul Harrenstein
Alessandro Abate
Michael Wooldridge
29
3
0
11 Jul 2023
Solutions to preference manipulation in recommender systems require knowledge of meta-preferences
Hal Ashton
Matija Franklin
13
5
0
14 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
65
183
0
30 Aug 2022
Counterfactual harm
Jonathan G. Richens
R. Beard
Daniel H. Thompson
29
27
0
27 Apr 2022
A Complete Criterion for Value of Information in Soluble Influence Diagrams
Chris van Merwijk
Ryan Carey
Tom Everitt
24
5
0
23 Feb 2022
1