ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.13353
  4. Cited By
Is Power-Seeking AI an Existential Risk?

Is Power-Seeking AI an Existential Risk?

16 June 2022
Joseph Carlsmith
    ELM
ArXivPDFHTML

Papers citing "Is Power-Seeking AI an Existential Risk?"

19 / 19 papers shown
Title
The Steganographic Potentials of Language Models
The Steganographic Potentials of Language Models
Artem Karpov
Tinuade Adeleke
Seong Hah Cho
Natalia Perez-Campanero
32
0
0
06 May 2025
What Is AI Safety? What Do We Want It to Be?
What Is AI Safety? What Do We Want It to Be?
Jacqueline Harding
Cameron Domenico Kirk-Giannini
66
0
0
05 May 2025
Hardware-Enabled Mechanisms for Verifying Responsible AI Development
Hardware-Enabled Mechanisms for Verifying Responsible AI Development
Aidan O'Gara
Gabriel Kulp
Will Hodgkins
James Petrie
Vincent Immler
Aydin Aysu
K. Basu
S. Bhasin
S. Picek
Ankur Srivastava
19
0
0
02 Apr 2025
Two Types of AI Existential Risk: Decisive and Accumulative
Two Types of AI Existential Risk: Decisive and Accumulative
Atoosa Kasirzadeh
57
14
0
20 Jan 2025
Principles for Responsible AI Consciousness Research
Principles for Responsible AI Consciousness Research
Patrick Butlin
Theodoros Lappas
38
1
0
13 Jan 2025
Towards shutdownable agents via stochastic choice
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
38
0
0
30 Jun 2024
The Dual Imperative: Innovation and Regulation in the AI Era
The Dual Imperative: Innovation and Regulation in the AI Era
Paulo Carvao
31
0
0
23 May 2024
Societal Adaptation to Advanced AI
Societal Adaptation to Advanced AI
Jamie Bernardi
Gabriel Mukobi
Hilary Greaves
Lennart Heim
Markus Anderljung
40
4
0
16 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
28
36
0
06 May 2024
A Review of the Evidence for Existential Risk from AI via Misaligned
  Power-Seeking
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
18
6
0
27 Oct 2023
Power-seeking can be probable and predictive for trained agents
Power-seeking can be probable and predictive for trained agents
Victoria Krakovna
János Kramár
TDI
27
16
0
13 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
  and Ethical Behavior in the MACHIAVELLI Benchmark
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
24
126
0
06 Apr 2023
Unifying Grokking and Double Descent
Unifying Grokking and Double Descent
Peter W. Battaglia
David Raposo
Kelsey
32
31
0
10 Mar 2023
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
33
473
0
19 Oct 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial
  Intelligence with Humans
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
84
27
0
14 Sep 2022
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
52
181
0
30 Aug 2022
Parametrically Retargetable Decision-Makers Tend To Seek Power
Parametrically Retargetable Decision-Makers Tend To Seek Power
Alexander Matt Turner
Prasad Tadepalli
10
18
0
27 Jun 2022
X-Risk Analysis for AI Research
X-Risk Analysis for AI Research
Dan Hendrycks
Mantas Mazeika
27
67
0
13 Jun 2022
AI safety via debate
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
201
199
0
02 May 2018
1