Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.19524
Cited By
AI Risk Management Should Incorporate Both Safety and Security
29 May 2024
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
Luxi He
Kaixuan Huang
Udari Madhushani
Vikash Sehwag
Weijia Shi
Boyi Wei
Tinghao Xie
Danqi Chen
Pin-Yu Chen
Jeffrey Ding
Ruoxi Jia
Jiaqi Ma
Arvind Narayanan
Weijie J Su
Mengdi Wang
Chaowei Xiao
Bo-wen Li
Dawn Song
Peter Henderson
Prateek Mittal
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AI Risk Management Should Incorporate Both Safety and Security"
22 / 22 papers shown
Title
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
55
1
0
09 Oct 2024
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
34
3
0
03 Sep 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
78
0
07 Feb 2024
Private Fine-tuning of Large Language Models with Zeroth-order Optimization
Xinyu Tang
Ashwinee Panda
Milad Nasr
Saeed Mahloujifar
Prateek Mittal
28
18
0
09 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
52
95
0
03 Jan 2024
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Hanlin Zhang
Benjamin L. Edelman
Danilo Francati
Daniele Venturi
G. Ateniese
Boaz Barak
WaLM
129
53
0
07 Nov 2023
Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Yuexiang Zhai
Shengbang Tong
Xiao Li
Mu Cai
Qing Qu
Yong Jae Lee
Y. Ma
VLM
MLLM
CLL
66
75
0
19 Sep 2023
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
Umar Iqbal
Tadayoshi Kohno
Franziska Roesner
ELM
SILM
51
41
0
19 Sep 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
90
124
0
01 May 2023
Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks
Yiwei Lu
Gautam Kamath
Yaoliang Yu
AAML
34
17
0
07 Mar 2023
Amplifying Membership Exposure via Data Poisoning
Yufei Chen
Chao Shen
Yun Shen
Cong Wang
Yang Zhang
AAML
35
19
0
01 Nov 2022
Explicit Tradeoffs between Adversarial and Natural Distributional Robustness
Mazda Moayeri
Kiarash Banihashem
S. Feizi
OOD
59
21
0
15 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
156
268
0
28 Sep 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
236
191
0
15 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
Adversarial Attacks On Multi-Agent Communication
James Tu
Tsun-Hsuan Wang
Jingkang Wang
S. Manivasagam
Mengye Ren
R. Urtasun
AAML
54
46
0
17 Jan 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
261
1,798
0
14 Dec 2020
RobustBench: a standardized adversarial robustness benchmark
Francesco Croce
Maksym Andriushchenko
Vikash Sehwag
Edoardo Debenedetti
Nicolas Flammarion
M. Chiang
Prateek Mittal
Matthias Hein
VLM
205
668
0
19 Oct 2020
Cryptanalytic Extraction of Neural Network Models
Nicholas Carlini
Matthew Jagielski
Ilya Mironov
FedML
MLAU
MIACV
AAML
62
120
0
10 Mar 2020
1