Path-Specific Objectives for Safer Agent Incentives

21 April 2022

Papers citing "Path-Specific Objectives for Safer Agent Incentives"

6 / 6 papers shown

Title
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking Sebastian Farquhar Vikrant Varma David Lindner David Elson Caleb Biddulph Ian Goodfellow Rohin Shah 88 1 0 22 Jan 2025
On Imperfect Recall in Multi-Agent Influence Diagrams James Fox Matt MacDermott Lewis Hammond Paul Harrenstein Alessandro Abate Michael Wooldridge 29 3 0 11 Jul 2023
Solutions to preference manipulation in recommender systems require knowledge of meta-preferences Hal Ashton Matija Franklin 13 5 0 14 Sep 2022
The Alignment Problem from a Deep Learning Perspective Richard Ngo Lawrence Chan Sören Mindermann 65 183 0 30 Aug 2022
Counterfactual harm Jonathan G. Richens R. Beard Daniel H. Thompson 29 27 0 27 Apr 2022
A Complete Criterion for Value of Information in Soluble Influence Diagrams Chris van Merwijk Ryan Carey Tom Everitt 24 5 0 23 Feb 2022