ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.01683
  4. Cited By
Optimal Policies Tend to Seek Power
v1v2v3v4v5v6v7v8v9v10 (latest)

Optimal Policies Tend to Seek Power

Neural Information Processing Systems (NeurIPS), 2019
3 December 2019
Alexander Matt Turner
Logan Smith
Rohin Shah
Andrew Critch
Prasad Tadepalli
ArXiv (abs)PDFHTML

Papers citing "Optimal Policies Tend to Seek Power"

50 / 65 papers shown
Password-Activated Shutdown Protocols for Misaligned Frontier Agents
Password-Activated Shutdown Protocols for Misaligned Frontier Agents
Kai Williams
Rohan Subramani
Francis Rhys Ward
68
0
0
29 Nov 2025
Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?
Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?
Willem Fourie
99
0
0
29 Oct 2025
Agentic Misalignment: How LLMs Could Be Insider Threats
Agentic Misalignment: How LLMs Could Be Insider Threats
Aengus Lynch
Benjamin Wright
Caleb Larson
Stuart Ritchie
Sören Mindermann
Ethan Perez
Kevin K. Troy
Evan Hubinger
157
37
0
05 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Antoine Maier
Aude Maier
Tom David
96
0
0
03 Oct 2025
Estimating the Empowerment of Language Model Agents
Estimating the Empowerment of Language Model Agents
Jinyeop Song
Jeff Gore
Max Kleiman-Weiner
134
1
0
26 Sep 2025
HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
Benjamin Sturgeon
Daniel Samuelson
Jacob Haimes
Jacy Reese Anthis
207
1
0
10 Sep 2025
Understanding Action Effects through Instrumental Empowerment in Multi-Agent Reinforcement Learning
Understanding Action Effects through Instrumental Empowerment in Multi-Agent Reinforcement Learning
Ardian Selmonaj
M. Strupl
Oleg Szehr
Alessandro Antonucci
151
0
0
21 Aug 2025
AI Testing Should Account for Sophisticated Strategic Behaviour
AI Testing Should Account for Sophisticated Strategic Behaviour
Vojtěch Kovařík
Eric Olav Chen
Sami Petersen
Alexis Ghersengorin
Vincent Conitzer
133
1
0
19 Aug 2025
Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Jobst Heitzig
Ram Potham
153
0
0
31 Jul 2025
Misalignment or misuse? The AGI alignment tradeoff
Misalignment or misuse? The AGI alignment tradeoffPhilosophical Studies (Philos. Stud.), 2025
Max Hellrigel-Holderbaum
Leonard Dung
277
2
0
04 Jun 2025
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
Djallel Bouneffouf
Matthew D Riemer
Kush R. Varshney
253
0
0
02 Jun 2025
Will artificial agents pursue power by default?
Will artificial agents pursue power by default?
Christian Tarsney
117
0
0
02 Jun 2025
Plasticity as the Mirror of Empowerment
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
André Barreto
Will Dabney
Shi Dong
...
Doina Precup
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
AI4CE
414
2
0
15 May 2025
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stañczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Böttinger
...
Timothy P. Lillicrap
Ana Marasović
Sylvie Delacroix
Gillian K. Hadfield
Siva Reddy
1.0K
3
0
27 Feb 2025
Universal AI maximizes Variational Empowerment
Universal AI maximizes Variational EmpowermentArtificial General Intelligence (AGI), 2025
Yusuke Hayashi
Koichi Takahashi
145
0
0
20 Feb 2025
Learning to Assist Humans without Inferring Rewards
Learning to Assist Humans without Inferring RewardsNeural Information Processing Systems (NeurIPS), 2024
Vivek Myers
Evan Ellis
Sergey Levine
Benjamin Eysenbach
Anca Dragan
567
10
0
17 Jan 2025
Active Inference and Human--Computer Interaction
Active Inference and Human--Computer Interaction
R. Murray-Smith
J. Williamson
Sebastian Stein
AI4CE
175
3
0
19 Dec 2024
Towards evaluations-based safety cases for AI scheming
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
261
22
0
29 Oct 2024
Potential-Based Intrinsic Motivation: Preserving Optimality With
  Complex, Non-Markovian Shaping Rewards
Potential-Based Intrinsic Motivation: Preserving Optimality With Complex, Non-Markovian Shaping Rewards
Grant C. Forbes
Leonardo Villalobos-Arias
Jianxun Wang
Arnav Jhala
David L. Roberts
263
2
0
16 Oct 2024
On Goodhart's law, with an application to value alignment
On Goodhart's law, with an application to value alignment
El-Mahdi El-Mhamdi
Lê-Nguyên Hoang
152
5
0
12 Oct 2024
RL, but don't do anything I wouldn't do
RL, but don't do anything I wouldn't doConference on Uncertainty in Artificial Intelligence (UAI), 2024
Michael K. Cohen
Marcus Hutter
Yoshua Bengio
Stuart J. Russell
OffRL
180
2
0
08 Oct 2024
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized
  Distributions
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions
Yu-Shin Huang
Peter Just
Krishna Narayanan
Chao Tian
270
15
0
06 Oct 2024
Beyond Preferences in AI Alignment
Beyond Preferences in AI AlignmentPhilosophical Studies (Philos. Stud.), 2024
Tan Zhi-Xuan
Micah Carroll
Matija Franklin
Hal Ashton
343
38
0
30 Aug 2024
Evaluating AI Evaluation: Perils and Prospects
Evaluating AI Evaluation: Perils and Prospects
John Burden
ELM
220
13
0
12 Jul 2024
Towards shutdownable agents via stochastic choice
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
456
1
0
30 Jun 2024
Games of Knightian Uncertainty as AGI testbeds
Games of Knightian Uncertainty as AGI testbeds
Spyridon Samothrakis
Dennis J. N. J. Soemers
Damian Machlanski
282
1
0
26 Jun 2024
The Benefits of Power Regularization in Cooperative Reinforcement
  Learning
The Benefits of Power Regularization in Cooperative Reinforcement Learning
Michelle Li
Michael Dennis
224
3
0
17 Jun 2024
Dishonesty in Helpful and Harmless Alignment
Dishonesty in Helpful and Harmless Alignment
Youcheng Huang
Jingkun Tang
Duanyu Feng
Zheng Zhang
Wenqiang Lei
Jiancheng Lv
Anthony G. Cohn
LLMSV
303
4
0
04 Jun 2024
REvolve: Reward Evolution with Large Language Models using Human Feedback
REvolve: Reward Evolution with Large Language Models using Human Feedback
Rishi Hazra
Alkis Sygkounas
Andreas Persson
Amy Loutfi
Pedro Zuidberg Dos Martires
360
3
0
03 Jun 2024
Contestable AI needs Computational Argumentation
Contestable AI needs Computational ArgumentationInternational Conference on Principles of Knowledge Representation and Reasoning (KR), 2024
Francesco Leofante
Hamed Ayoobi
Adam Dejl
Gabriel Freedman
Deniz Gorur
...
Anna Rapberger
Fabrizio Russo
Xiang Yin
Dekai Zhang
Francesca Toni
263
12
0
17 May 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and
  Safety
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
327
13
0
18 Mar 2024
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley
157
18
0
07 Mar 2024
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw
Shivam Singhal
Anca Dragan
AAML
375
11
0
05 Mar 2024
Quantifying stability of non-power-seeking in artificial agents
Quantifying stability of non-power-seeking in artificial agents
Evan Ryan Gunter
Yevgeny Liokumovich
Victoria Krakovna
287
2
0
07 Jan 2024
Measuring Value Alignment
Measuring Value Alignment
Fazl Barez
Juil Sock
109
5
0
23 Dec 2023
Preventing Language Models From Hiding Their Reasoning
Preventing Language Models From Hiding Their Reasoning
Fabien Roger
Ryan Greenblatt
LRM
448
28
0
27 Oct 2023
A Review of the Evidence for Existential Risk from AI via Misaligned
  Power-Seeking
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
166
9
0
27 Oct 2023
Managing extreme AI risks amid rapid progress
Managing extreme AI risks amid rapid progress
Yoshua Bengio
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
...
Juil Sock
Stuart J. Russell
Daniel Kahneman
J. Brauner
Sören Mindermann
340
30
0
26 Oct 2023
Improving Generalization of Alignment with Human Preferences through
  Group Invariant Learning
Improving Generalization of Alignment with Human Preferences through Group Invariant LearningInternational Conference on Learning Representations (ICLR), 2023
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Jiajun Sun
...
Xiao Wang
Haoran Huang
Tao Gui
Tao Gui
Xuanjing Huang
285
22
0
18 Oct 2023
AI Systems of Concern
AI Systems of Concern
Kayla Matteucci
S. Avin
Fazl Barez
Seán Ó hÉigeartaigh
205
1
0
09 Oct 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Shangda Wu
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
356
278
0
26 Sep 2023
A Case for AI Safety via Law
A Case for AI Safety via Law
Jeffrey W. Johnston
258
1
0
31 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALMOffRL
354
712
0
27 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
425
154
0
06 Jul 2023
Human Control: Definitions and Algorithms
Human Control: Definitions and AlgorithmsConference on Uncertainty in Artificial Intelligence (UAI), 2023
Ryan Carey
Tom Everitt
216
12
0
31 May 2023
Incentivizing honest performative predictions with proper scoring rules
Incentivizing honest performative predictions with proper scoring rulesConference on Uncertainty in Artificial Intelligence (UAI), 2023
Caspar Oesterheld
Johannes Treutlein
Emery Cooper
Rubi Hudson
278
10
0
28 May 2023
Model evaluation for extreme risks
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
...
Vijay Bolina
Jack Clark
Yoshua Bengio
Paul Christiano
Allan Dafoe
ELM
289
195
0
24 May 2023
Selection for short-term empowerment accelerates the evolution of
  homeostatic neural cellular automata
Selection for short-term empowerment accelerates the evolution of homeostatic neural cellular automataAnnual Conference on Genetic and Evolutionary Computation (GECCO), 2023
Caitlin Grasso
Josh Bongard
138
3
0
24 May 2023
Power-seeking can be probable and predictive for trained agents
Power-seeking can be probable and predictive for trained agents
Victoria Krakovna
János Kramár
TDI
154
22
0
13 Apr 2023
Eight Things to Know about Large Language Models
Eight Things to Know about Large Language Models
Sam Bowman
ALM
311
139
0
02 Apr 2023
12
Next