Embedded Agency

25 February 2019

Papers citing "Embedded Agency"

10 / 10 papers shown

Title
Representation Learning on a Random Lattice Aryeh Brill OOD FAtt AI4CE 73 0 0 28 Apr 2025
A Theory of Bounded Inductive Rationality Caspar Oesterheld A. Demski Vincent Conitzer 14 7 0 11 Jul 2023
GPT-NeoX-20B: An Open-Source Autoregressive Language Model Sid Black Stella Biderman Eric Hallahan Quentin G. Anthony Leo Gao ... Shivanshu Purohit Laria Reynolds J. Tow Benqi Wang Samuel Weinbach 96 801 0 14 Apr 2022
Temporal Inference with Finite Factored Sets Scott Garrabrant 11 2 0 23 Sep 2021
Goal Misgeneralization in Deep Reinforcement Learning L. Langosco Jack Koch Lee D. Sharkey J. Pfau Laurent Orseau David M. Krueger 27 78 0 28 May 2021
Avoiding Tampering Incentives in Deep RL via Decoupled Approval J. Uesato Ramana Kumar Victoria Krakovna Tom Everitt Richard Ngo Shane Legg 26 14 0 17 Nov 2020
REALab: An Embedded Perspective on Tampering Ramana Kumar J. Uesato Richard Ngo Tom Everitt Victoria Krakovna Shane Legg 22 10 0 17 Nov 2020
Purely Bayesian counterfactuals versus Newcomb's paradox L. Hoang 11 0 0 10 Aug 2020
Implications of Quantum Computing for Artificial Intelligence alignment research Jaime Sevilla Pablo Moreno 11 1 0 19 Aug 2019
Scalable agent alignment via reward modeling: a research direction Jan Leike David M. Krueger Tom Everitt Miljan Martic Vishal Maini Shane Legg 34 395 0 19 Nov 2018