Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1912.01683
Cited By
v1
v2
v3
v4
v5
v6
v7
v8
v9
v10 (latest)
Optimal Policies Tend to Seek Power
Neural Information Processing Systems (NeurIPS), 2019
3 December 2019
Alexander Matt Turner
Logan Smith
Rohin Shah
Andrew Critch
Prasad Tadepalli
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Optimal Policies Tend to Seek Power"
50 / 62 papers shown
Title
Agentic Misalignment: How LLMs Could Be Insider Threats
Aengus Lynch
Benjamin Wright
Caleb Larson
Stuart Ritchie
Sören Mindermann
Ethan Perez
Kevin K. Troy
Evan Hubinger
104
31
0
05 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Antoine Maier
Aude Maier
Tom David
76
0
0
03 Oct 2025
Estimating the Empowerment of Language Model Agents
Jinyeop Song
Jeff Gore
Max Kleiman-Weiner
118
1
0
26 Sep 2025
HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
Benjamin Sturgeon
Daniel Samuelson
Jacob Haimes
Jacy Reese Anthis
183
1
0
10 Sep 2025
Understanding Action Effects through Instrumental Empowerment in Multi-Agent Reinforcement Learning
Ardian Selmonaj
M. Strupl
Oleg Szehr
Alessandro Antonucci
115
0
0
21 Aug 2025
AI Testing Should Account for Sophisticated Strategic Behaviour
Vojtěch Kovařík
Eric Olav Chen
Sami Petersen
Alexis Ghersengorin
Vincent Conitzer
105
1
0
19 Aug 2025
Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Jobst Heitzig
Ram Potham
105
0
0
31 Jul 2025
Misalignment or misuse? The AGI alignment tradeoff
Philosophical Studies (Philos. Stud.), 2025
Max Hellrigel-Holderbaum
Leonard Dung
234
2
0
04 Jun 2025
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
Djallel Bouneffouf
Matthew D Riemer
Kush R. Varshney
217
0
0
02 Jun 2025
Will artificial agents pursue power by default?
Christian Tarsney
96
0
0
02 Jun 2025
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
André Barreto
Will Dabney
Shi Dong
...
Doina Precup
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
AI4CE
341
2
0
15 May 2025
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stañczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Böttinger
...
Timothy P. Lillicrap
Ana Marasović
Sylvie Delacroix
Gillian K. Hadfield
Siva Reddy
972
3
0
27 Feb 2025
Universal AI maximizes Variational Empowerment
Artificial General Intelligence (AGI), 2025
Yusuke Hayashi
Koichi Takahashi
142
0
0
20 Feb 2025
Learning to Assist Humans without Inferring Rewards
Neural Information Processing Systems (NeurIPS), 2024
Vivek Myers
Evan Ellis
Sergey Levine
Benjamin Eysenbach
Anca Dragan
506
10
0
17 Jan 2025
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
237
21
0
29 Oct 2024
Potential-Based Intrinsic Motivation: Preserving Optimality With Complex, Non-Markovian Shaping Rewards
Grant C. Forbes
Leonardo Villalobos-Arias
Jianxun Wang
Arnav Jhala
David L. Roberts
219
2
0
16 Oct 2024
On Goodhart's law, with an application to value alignment
El-Mahdi El-Mhamdi
Lê-Nguyên Hoang
103
4
0
12 Oct 2024
RL, but don't do anything I wouldn't do
Conference on Uncertainty in Artificial Intelligence (UAI), 2024
Michael K. Cohen
Marcus Hutter
Yoshua Bengio
Stuart J. Russell
OffRL
156
2
0
08 Oct 2024
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions
Yu-Shin Huang
Peter Just
Krishna Narayanan
Chao Tian
250
15
0
06 Oct 2024
Beyond Preferences in AI Alignment
Philosophical Studies (Philos. Stud.), 2024
Tan Zhi-Xuan
Micah Carroll
Matija Franklin
Hal Ashton
307
37
0
30 Aug 2024
Evaluating AI Evaluation: Perils and Prospects
John Burden
ELM
192
13
0
12 Jul 2024
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
412
1
0
30 Jun 2024
Games of Knightian Uncertainty as AGI testbeds
Spyridon Samothrakis
Dennis J. N. J. Soemers
Damian Machlanski
226
1
0
26 Jun 2024
The Benefits of Power Regularization in Cooperative Reinforcement Learning
Michelle Li
Michael Dennis
191
3
0
17 Jun 2024
Dishonesty in Helpful and Harmless Alignment
Youcheng Huang
Jingkun Tang
Duanyu Feng
Zheng Zhang
Wenqiang Lei
Jiancheng Lv
Anthony G. Cohn
LLMSV
275
4
0
04 Jun 2024
REvolve: Reward Evolution with Large Language Models using Human Feedback
Rishi Hazra
Alkis Sygkounas
Andreas Persson
Amy Loutfi
Pedro Zuidberg Dos Martires
318
3
0
03 Jun 2024
Contestable AI needs Computational Argumentation
International Conference on Principles of Knowledge Representation and Reasoning (KR), 2024
Francesco Leofante
Hamed Ayoobi
Adam Dejl
Gabriel Freedman
Deniz Gorur
...
Anna Rapberger
Fabrizio Russo
Xiang Yin
Dekai Zhang
Francesca Toni
204
12
0
17 May 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
291
13
0
18 Mar 2024
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley
137
17
0
07 Mar 2024
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw
Shivam Singhal
Anca Dragan
AAML
278
11
0
05 Mar 2024
Quantifying stability of non-power-seeking in artificial agents
Evan Ryan Gunter
Yevgeny Liokumovich
Victoria Krakovna
255
2
0
07 Jan 2024
Measuring Value Alignment
Fazl Barez
Juil Sock
87
5
0
23 Dec 2023
Preventing Language Models From Hiding Their Reasoning
Fabien Roger
Ryan Greenblatt
LRM
408
28
0
27 Oct 2023
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
132
9
0
27 Oct 2023
Managing extreme AI risks amid rapid progress
Yoshua Bengio
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
...
Juil Sock
Stuart J. Russell
Daniel Kahneman
J. Brauner
Sören Mindermann
280
30
0
26 Oct 2023
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
International Conference on Learning Representations (ICLR), 2023
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Jiajun Sun
...
Xiao Wang
Haoran Huang
Tao Gui
Tao Gui
Xuanjing Huang
257
22
0
18 Oct 2023
AI Systems of Concern
Kayla Matteucci
S. Avin
Fazl Barez
Seán Ó hÉigeartaigh
188
1
0
09 Oct 2023
Large Language Model Alignment: A Survey
Shangda Wu
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
320
273
0
26 Sep 2023
A Case for AI Safety via Law
Jeffrey W. Johnston
214
1
0
31 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
321
689
0
27 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
327
151
0
06 Jul 2023
Human Control: Definitions and Algorithms
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Ryan Carey
Tom Everitt
200
12
0
31 May 2023
Incentivizing honest performative predictions with proper scoring rules
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Caspar Oesterheld
Johannes Treutlein
Emery Cooper
Rubi Hudson
225
9
0
28 May 2023
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
...
Vijay Bolina
Jack Clark
Yoshua Bengio
Paul Christiano
Allan Dafoe
ELM
245
193
0
24 May 2023
Selection for short-term empowerment accelerates the evolution of homeostatic neural cellular automata
Annual Conference on Genetic and Evolutionary Computation (GECCO), 2023
Caitlin Grasso
Josh Bongard
110
3
0
24 May 2023
Power-seeking can be probable and predictive for trained agents
Victoria Krakovna
János Kramár
TDI
128
20
0
13 Apr 2023
Eight Things to Know about Large Language Models
Sam Bowman
ALM
287
134
0
02 Apr 2023
Unifying Grokking and Double Descent
Peter W. Battaglia
David Raposo
Kelsey
243
46
0
10 Mar 2023
Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards
Social Science Research Network (SSRN), 2023
John J. Nay
ELM
AILaw
218
19
0
24 Jan 2023
Scaling Laws for Reward Model Overoptimization
International Conference on Machine Learning (ICML), 2022
Leo Gao
John Schulman
Jacob Hilton
ALM
289
751
0
19 Oct 2022
1
2
Next