Modeling AGI Safety Frameworks with Causal Influence Diagrams

Modeling AGI Safety Frameworks with Causal Influence Diagrams

20 June 2019

Victoria Krakovna

ArXiv (abs)PDF HTML

Papers citing "Modeling AGI Safety Frameworks with Causal Influence Diagrams"

11 / 11 papers shown

Title
Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governance N. Corrêa Camila Galvão J. Santos C. Pino Edson Pontes Pinto ... Diogo Massmann Rodrigo Mambrini Luiza Galvao Edmund Terem Nythamar Fernandes de Oliveira 138 99 0 23 Jun 2022
Counterfactual harm Jonathan G. Richens R. Beard Daniel H. Thompson 108 29 0 27 Apr 2022
A Complete Criterion for Value of Information in Soluble Influence Diagrams Chris van Merwijk Ryan Carey Tom Everitt 72 5 0 23 Feb 2022
Alignment of Language Agents Zachary Kenton Tom Everitt Laura Weidinger Iason Gabriel Vladimir Mikulik G. Irving 90 166 0 26 Mar 2021
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice Lewis Hammond James Fox Tom Everitt Alessandro Abate Michael Wooldridge 63 10 0 09 Feb 2021
Agent Incentives: A Causal Perspective Tom Everitt Ryan Carey Eric D. Langlois Pedro A. Ortega Shane Legg CML 74 56 0 02 Feb 2021
Counterfactual Planning in AGI Systems K. Holtman 31 3 0 29 Jan 2021
Avoiding Tampering Incentives in Deep RL via Decoupled Approval J. Uesato Ramana Kumar Victoria Krakovna Tom Everitt Richard Ngo Shane Legg 69 16 0 17 Nov 2020
Incentives for Responsiveness, Instrumental Control and Impact Ryan Carey Eric D. Langlois Chris van Merwijk Shane Legg Tom Everitt CML 88 13 0 20 Jan 2020
Superintelligence Safety: A Requirements Engineering Perspective H. Kaindl Jonas Ferdigg SILM ELM 32 0 0 26 Sep 2019
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective Tom Everitt Marcus Hutter Ramana Kumar Victoria Krakovna 105 97 0 13 Aug 2019