ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.01683
  4. Cited By
Optimal Policies Tend to Seek Power
v1v2v3v4v5v6v7v8v9v10 (latest)

Optimal Policies Tend to Seek Power

Neural Information Processing Systems (NeurIPS), 2019
3 December 2019
Alexander Matt Turner
Logan Smith
Rohin Shah
Andrew Critch
Prasad Tadepalli
ArXiv (abs)PDFHTML

Papers citing "Optimal Policies Tend to Seek Power"

15 / 65 papers shown
Unifying Grokking and Double Descent
Unifying Grokking and Double Descent
Peter W. Battaglia
David Raposo
Kelsey
263
47
0
10 Mar 2023
Large Language Models as Fiduciaries: A Case Study Toward Robustly
  Communicating With Artificial Intelligence Through Legal Standards
Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal StandardsSocial Science Research Network (SSRN), 2023
John J. Nay
ELMAILaw
270
20
0
24 Jan 2023
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model OveroptimizationInternational Conference on Machine Learning (ICML), 2022
Leo Gao
John Schulman
Jacob Hilton
ALM
373
766
0
19 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For
  Correct Goals
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
419
103
0
04 Oct 2022
Defining and Characterizing Reward Hacking
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
367
91
0
27 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial
  Intelligence with Humans
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with HumansSocial Science Research Network (SSRN), 2022
John J. Nay
ELMAILaw
935
33
0
14 Sep 2022
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning PerspectiveInternational Conference on Learning Representations (ICLR), 2022
Richard Ngo
Lawrence Chan
Sören Mindermann
511
247
0
30 Aug 2022
Parametrically Retargetable Decision-Makers Tend To Seek Power
Parametrically Retargetable Decision-Makers Tend To Seek PowerNeural Information Processing Systems (NeurIPS), 2022
Alexander Matt Turner
Prasad Tadepalli
223
20
0
27 Jun 2022
Formalizing the Problem of Side Effect Regularization
Formalizing the Problem of Side Effect Regularization
Alexander Matt Turner
Aseem Saxena
Prasad Tadepalli
288
3
0
23 Jun 2022
Is Power-Seeking AI an Existential Risk?
Is Power-Seeking AI an Existential Risk?
Joseph Carlsmith
ELM
189
118
0
16 Jun 2022
X-Risk Analysis for AI Research
X-Risk Analysis for AI Research
Dan Hendrycks
Mantas Mazeika
514
79
0
13 Jun 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
358
947
0
14 Apr 2022
Unsolved Problems in ML Safety
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
711
342
0
28 Sep 2021
Learning Altruistic Behaviours in Reinforcement Learning without
  External Rewards
Learning Altruistic Behaviours in Reinforcement Learning without External RewardsInternational Conference on Learning Representations (ICLR), 2021
Tim Franzmeyer
Mateusz Malinowski
João F. Henriques
323
10
0
20 Jul 2021
Goal Misgeneralization in Deep Reinforcement Learning
Goal Misgeneralization in Deep Reinforcement LearningInternational Conference on Machine Learning (ICML), 2021
L. Langosco
Jack Koch
Lee D. Sharkey
J. Pfau
Laurent Orseau
David M. Krueger
494
111
0
28 May 2021
Previous
12