AI Research Considerations for Human Existential Safety (ARCHES)

30 May 2020

Papers citing "AI Research Considerations for Human Existential Safety (ARCHES)"

25 / 25 papers shown

Title
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills David Noever 12 0 0 27 Aug 2025
Agents Require Metacognitive and Strategic Reasoning to Succeed in the Coming Labor Markets Simpson Zhang Tennison Liu M. Schaar LLMAG 97 0 0 26 May 2025
Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains Botao Amber Hu Helena Rong 61 0 0 24 May 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity HyunJin Kim Xiaoyuan Yi Jing Yao Muhua Huang Jinyeong Bak James Evans Xing Xie 119 0 0 08 Mar 2025
Who's Driving? Game Theoretic Path Risk of AGI Development Robin Young LLMSV 160 0 0 28 Jan 2025
FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas Yu Lei Hao Liu Chengxing Xie Songjia Liu Zhiyu Yin Canyu Chen Ge Li Philip Torr Zhen Wu 118 10 0 14 Oct 2024
Evaluating AI Evaluation: Perils and Prospects John Burden ELM 120 11 0 12 Jul 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence Edward Hughes Michael Dennis Jack Parker-Holder Feryal M. P. Behbahani Aditi Mavalankar Yuge Shi Tom Schaul Tim Rocktaschel LRM 144 40 0 06 Jun 2024
Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment Aidan Kierans Avijit Ghosh Hananel Hazan Shiri Dori-Hacohen 95 4 0 06 Jun 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents Giorgio Piatti Zhijing Jin Max Kleiman-Weiner Bernhard Schölkopf Mrinmaya Sachan Rada Mihalcea LLMAG 161 35 0 25 Apr 2024
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking Rose Hadshar 70 8 0 27 Oct 2023
Large Language Model Alignment: A Survey Tianhao Shen Renren Jin Yufei Huang Chuang Liu Weilong Dong Zishan Guo Xinwei Wu Yan Liu Deyi Xiong LM&MA 144 230 0 26 Sep 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Stephen Casper Xander Davies Claudia Shi T. Gilbert Jérémy Scheurer ... Erdem Biyik Anca Dragan David M. Krueger Dorsa Sadigh Dylan Hadfield-Menell ALM OffRL 197 581 0 27 Jul 2023
OMNI: Open-endedness via Models of human Notions of Interestingness Jenny Zhang Joel Lehman Kenneth O. Stanley Jeff Clune LRM 195 42 0 02 Jun 2023
Why is plausibility surprisingly problematic as an XAI criterion? Weina Jin Xiaoxiao Li Ghassan Hamarneh 158 7 0 30 Mar 2023
Negative Human Rights as a Basis for Long-term AI Safety and Regulation Ondrej Bajgar Jan Horenovsky FaML 115 10 0 31 Aug 2022
Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governance N. Corrêa Camila Galvão J. Santos C. Pino Edson Pontes Pinto ... Diogo Massmann Rodrigo Mambrini Luiza Galvao Edmund Terem Nythamar Fernandes de Oliveira 255 118 0 23 Jun 2022
Is Power-Seeking AI an Existential Risk? Joseph Carlsmith ELM 121 93 0 16 Jun 2022
Guidelines and Evaluation of Clinical Explainable AI in Medical Image Analysis Weina Jin Xiaoxiao Li M. Fatehi Ghassan Hamarneh ELM XAI 106 103 0 16 Feb 2022
Structured access: an emerging paradigm for safe AI deployment Toby Shevlane 107 50 0 13 Jan 2022
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail Sam Bowman OffRL 159 46 0 15 Oct 2021
Unsolved Problems in ML Safety Dan Hendrycks Nicholas Carlini John Schulman Jacob Steinhardt 424 308 0 28 Sep 2021
Open Problems in Cooperative AI Allan Dafoe Edward Hughes Yoram Bachrach Tantum Collins Kevin R. McKee Joel Z Leibo Kate Larson T. Graepel 174 212 0 15 Dec 2020
Achilles Heels for AGI/ASI via Decision Theoretic Adversaries Stephen L. Casper 153 4 0 12 Oct 2020
On Controllability of AI Roman V. Yampolskiy 78 14 0 19 Jul 2020