Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2006.04948
Cited By
AI Research Considerations for Human Existential Safety (ARCHES)
30 May 2020
Andrew Critch
David M. Krueger
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AI Research Considerations for Human Existential Safety (ARCHES)"
25 / 25 papers shown
Title
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills
David Noever
12
0
0
27 Aug 2025
Agents Require Metacognitive and Strategic Reasoning to Succeed in the Coming Labor Markets
Simpson Zhang
Tennison Liu
M. Schaar
LLMAG
97
0
0
26 May 2025
Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains
Botao Amber Hu
Helena Rong
61
0
0
24 May 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
119
0
0
08 Mar 2025
Who's Driving? Game Theoretic Path Risk of AGI Development
Robin Young
LLMSV
160
0
0
28 Jan 2025
FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas
Yu Lei
Hao Liu
Chengxing Xie
Songjia Liu
Zhiyu Yin
Canyu Chen
Ge Li
Philip Torr
Zhen Wu
118
10
0
14 Oct 2024
Evaluating AI Evaluation: Perils and Prospects
John Burden
ELM
120
11
0
12 Jul 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence
Edward Hughes
Michael Dennis
Jack Parker-Holder
Feryal M. P. Behbahani
Aditi Mavalankar
Yuge Shi
Tom Schaul
Tim Rocktaschel
LRM
144
40
0
06 Jun 2024
Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment
Aidan Kierans
Avijit Ghosh
Hananel Hazan
Shiri Dori-Hacohen
95
4
0
06 Jun 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Rada Mihalcea
LLMAG
161
35
0
25 Apr 2024
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
70
8
0
27 Oct 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
144
230
0
26 Sep 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
197
581
0
27 Jul 2023
OMNI: Open-endedness via Models of human Notions of Interestingness
Jenny Zhang
Joel Lehman
Kenneth O. Stanley
Jeff Clune
LRM
195
42
0
02 Jun 2023
Why is plausibility surprisingly problematic as an XAI criterion?
Weina Jin
Xiaoxiao Li
Ghassan Hamarneh
158
7
0
30 Mar 2023
Negative Human Rights as a Basis for Long-term AI Safety and Regulation
Ondrej Bajgar
Jan Horenovsky
FaML
115
10
0
31 Aug 2022
Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governance
N. Corrêa
Camila Galvão
J. Santos
C. Pino
Edson Pontes Pinto
...
Diogo Massmann
Rodrigo Mambrini
Luiza Galvao
Edmund Terem
Nythamar Fernandes de Oliveira
255
118
0
23 Jun 2022
Is Power-Seeking AI an Existential Risk?
Joseph Carlsmith
ELM
121
93
0
16 Jun 2022
Guidelines and Evaluation of Clinical Explainable AI in Medical Image Analysis
Weina Jin
Xiaoxiao Li
M. Fatehi
Ghassan Hamarneh
ELM
XAI
106
103
0
16 Feb 2022
Structured access: an emerging paradigm for safe AI deployment
Toby Shevlane
107
50
0
13 Jan 2022
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
159
46
0
15 Oct 2021
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
424
308
0
28 Sep 2021
Open Problems in Cooperative AI
Allan Dafoe
Edward Hughes
Yoram Bachrach
Tantum Collins
Kevin R. McKee
Joel Z Leibo
Kate Larson
T. Graepel
174
212
0
15 Dec 2020
Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
Stephen L. Casper
153
4
0
12 Oct 2020
On Controllability of AI
Roman V. Yampolskiy
78
14
0
19 Jul 2020
1