Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.13353
Cited By
Is Power-Seeking AI an Existential Risk?
16 June 2022
Joseph Carlsmith
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is Power-Seeking AI an Existential Risk?"
19 / 19 papers shown
Title
The Steganographic Potentials of Language Models
Artem Karpov
Tinuade Adeleke
Seong Hah Cho
Natalia Perez-Campanero
32
0
0
06 May 2025
What Is AI Safety? What Do We Want It to Be?
Jacqueline Harding
Cameron Domenico Kirk-Giannini
66
0
0
05 May 2025
Hardware-Enabled Mechanisms for Verifying Responsible AI Development
Aidan O'Gara
Gabriel Kulp
Will Hodgkins
James Petrie
Vincent Immler
Aydin Aysu
K. Basu
S. Bhasin
S. Picek
Ankur Srivastava
19
0
0
02 Apr 2025
Two Types of AI Existential Risk: Decisive and Accumulative
Atoosa Kasirzadeh
57
14
0
20 Jan 2025
Principles for Responsible AI Consciousness Research
Patrick Butlin
Theodoros Lappas
38
1
0
13 Jan 2025
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
38
0
0
30 Jun 2024
The Dual Imperative: Innovation and Regulation in the AI Era
Paulo Carvao
31
0
0
23 May 2024
Societal Adaptation to Advanced AI
Jamie Bernardi
Gabriel Mukobi
Hilary Greaves
Lennart Heim
Markus Anderljung
40
4
0
16 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
28
36
0
06 May 2024
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
18
6
0
27 Oct 2023
Power-seeking can be probable and predictive for trained agents
Victoria Krakovna
János Kramár
TDI
27
16
0
13 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
24
126
0
06 Apr 2023
Unifying Grokking and Double Descent
Peter W. Battaglia
David Raposo
Kelsey
32
31
0
10 Mar 2023
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
33
473
0
19 Oct 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
84
27
0
14 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
52
181
0
30 Aug 2022
Parametrically Retargetable Decision-Makers Tend To Seek Power
Alexander Matt Turner
Prasad Tadepalli
10
18
0
27 Jun 2022
X-Risk Analysis for AI Research
Dan Hendrycks
Mantas Mazeika
27
67
0
13 Jun 2022
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
201
199
0
02 May 2018
1