Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.17688
Cited By
Managing extreme AI risks amid rapid progress
26 October 2023
Yoshua Bengio
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Y. Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian Hadfield
Jeff Clune
Tegan Maharaj
Frank Hutter
Atilim Gunecs Baydin
Sheila A. McIlraith
Qiqi Gao
Ashwin Acharya
David M. Krueger
Anca Dragan
Philip H. S. Torr
Stuart J. Russell
Daniel Kahneman
J. Brauner
Sören Mindermann
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Managing extreme AI risks amid rapid progress"
17 / 17 papers shown
Title
An alignment safety case sketch based on debate
Marie Davidsen Buhl
Jacob Pfau
Benjamin Hilton
Geoffrey Irving
30
0
0
06 May 2025
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
63
0
0
28 Apr 2025
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Feifei Zhao
Y. Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
80
0
0
24 Apr 2025
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
Peter Hall
Olivia Mundahl
Sunoo Park
71
0
0
30 Jan 2025
Two Types of AI Existential Risk: Decisive and Accumulative
Atoosa Kasirzadeh
55
13
0
20 Jan 2025
On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data
Aitor Martinez-Seras
Javier Del Ser
Alain Andres
Pablo García Bringas
Pablo Garcia-Bringas
OODD
40
0
0
07 Nov 2024
On the Role of Attention Heads in Large Language Model Safety
Z. Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Junfeng Fang
Yongbin Li
54
5
0
17 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
68
1
0
09 Oct 2024
Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort
Cristian Trout
21
0
0
10 Sep 2024
Safeguarding AI Agents: Developing and Analyzing Safety Architectures
Ishaan Domkundwar
Mukunda N S
Ishaan Bhola
Riddhik Kochhar
LLMAG
29
1
0
03 Sep 2024
Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment
Sung Une Lee
Harsha Perera
Yue Liu
Boming Xia
Qinghua Lu
Liming Zhu
Olivier Salvado
Jon Whittle
19
1
0
02 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
70
18
0
02 Jul 2024
Modeling Emotions and Ethics with Large Language Models
Edward Y. Chang
32
1
0
15 Apr 2024
Safety Cases: How to Justify the Safety of Advanced AI Systems
Joshua Clymer
Nick Gabrieli
David Krueger
Thomas Larsen
34
25
0
15 Mar 2024
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
209
178
0
20 Oct 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
164
268
0
28 Sep 2021
1