v1v2v3v4v5v6v7v8v9v10 (latest)

Optimal Policies Tend to Seek Power

Neural Information Processing Systems (NeurIPS), 2019

3 December 2019

Alexander Matt Turner

Papers citing "Optimal Policies Tend to Seek Power"

15 / 65 papers shown

Unifying Grokking and Double Descent

Peter W. Battaglia

David Raposo

Kelsey

263

10 Mar 2023

Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal StandardsSocial Science Research Network (SSRN), 2023

John J. Nay

ELM AILaw

270

24 Jan 2023

Scaling Laws for Reward Model OveroptimizationInternational Conference on Machine Learning (ICML), 2022

373

766

19 Oct 2022

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

419

103

04 Oct 2022

Defining and Characterizing Reward Hacking

Joar Skalse

Nikolaus H. R. Howe

Dmitrii Krasheninnikov

David M. Krueger

367

27 Sep 2022

Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with HumansSocial Science Research Network (SSRN), 2022

John J. Nay

ELM AILaw

935

14 Sep 2022

The Alignment Problem from a Deep Learning PerspectiveInternational Conference on Learning Representations (ICLR), 2022

Richard Ngo

Lawrence Chan

Sören Mindermann

511

247

30 Aug 2022

Parametrically Retargetable Decision-Makers Tend To Seek PowerNeural Information Processing Systems (NeurIPS), 2022

Alexander Matt Turner

Prasad Tadepalli

223

27 Jun 2022

Formalizing the Problem of Side Effect Regularization

Alexander Matt Turner

Aseem Saxena

Prasad Tadepalli

288

23 Jun 2022

Is Power-Seeking AI an Existential Risk?

Joseph Carlsmith

ELM

189

118

16 Jun 2022

X-Risk Analysis for AI Research

Dan Hendrycks

Mantas Mazeika

514

13 Jun 2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

...

358

947

14 Apr 2022

Unsolved Problems in ML Safety

711

342

28 Sep 2021

Learning Altruistic Behaviours in Reinforcement Learning without External RewardsInternational Conference on Learning Representations (ICLR), 2021

Tim Franzmeyer

Mateusz Malinowski

João F. Henriques

323

20 Jul 2021

Goal Misgeneralization in Deep Reinforcement LearningInternational Conference on Machine Learning (ICML), 2021

494

111

28 May 2021