Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15897
Cited By
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
29 January 2024
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Red-Teaming for Generative AI: Silver Bullet or Security Theater?"
17 / 17 papers shown
Title
Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation
Bikash Saha
Nanda Rani
Sandeep K. Shukla
43
0
0
30 Apr 2025
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines
Sachin R. Pendse
Darren Gergle
Rachel Kornfield
J. Meyerhoff
David C. Mohr
Jina Suh
Annie Wescott
Casey Williams
J. Schleider
39
0
0
29 Apr 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
54
0
0
25 Apr 2025
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
Peter Hall
Olivia Mundahl
Sunoo Park
63
0
0
30 Jan 2025
Dialect prejudice predicts AI decisions about people's character, employability, and criminality
Valentin Hofmann
Pratyusha Kalluri
Dan Jurafsky
Sharese King
56
16
0
01 Mar 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
8
75
0
25 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
52
95
0
03 Jan 2024
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
127
116
0
09 Nov 2023
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
37
14
0
22 Oct 2023
Probing LLMs for hate speech detection: strengths and vulnerabilities
Sarthak Roy
Ashish Harshavardhan
Animesh Mukherjee
Punyajoy Saha
60
31
0
19 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
135
139
0
16 Oct 2023
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Alex Mei
Sharon Levy
William Yang Wang
AAML
31
7
0
14 Oct 2023
The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice
Fernando Delgado
Stephen Yang
Michael A. Madaio
Qian Yang
45
98
0
02 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
90
84
0
21 Aug 2023
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
116
179
0
03 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
1