Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.17444
Cited By
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
27 May 2023
Deokjae Lee
JunYeong Lee
Jung-Woo Ha
Jin-Hwa Kim
Sang-Woo Lee
Hwaran Lee
Hyun Oh Song
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Query-Efficient Black-Box Red Teaming via Bayesian Optimization"
24 / 24 papers shown
Title
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
Quy-Anh Dang
Chris Ngo
Truong Son-Hy
AAML
SyDa
33
0
0
21 Apr 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
57
0
0
14 Jan 2025
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models
Zhi-Yi Chin
Kuan-Chen Mu
Mario Fritz
Pin-Yu Chen
DiffM
83
0
0
25 Nov 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
46
1
0
09 Oct 2024
Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
Jinchuan Zhang
Yan Zhou
Yaxin Liu
Ziming Li
Songlin Hu
AAML
26
3
0
25 Sep 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
29
7
0
30 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
9
0
20 Jul 2024
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia
Tianyu Pang
Chao Du
Yihao Huang
Jindong Gu
Yang Liu
Xiaochun Cao
Min-Bin Lin
AAML
44
22
0
31 May 2024
Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
Shuyu Cheng
Yibo Miao
Yinpeng Dong
Xiao Yang
Xiao-Shan Gao
Jun Zhu
AAML
27
3
0
29 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
53
12
0
28 May 2024
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
Shang Shang
Xinqiang Zhao
Zhongjiang Yao
Yepeng Yao
Liya Su
Zijing Fan
Xiaodan Zhang
Zhengwei Jiang
55
4
0
06 May 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
PILM
65
24
0
19 Mar 2024
Curiosity-driven Red-teaming for Large Language Models
Zhang-Wei Hong
Idan Shenfeld
T. Wang
Yung-Sung Chuang
Aldo Pareja
James R. Glass
Akash Srivastava
Pulkit Agrawal
LRM
28
39
0
29 Feb 2024
Gradient-Based Language Model Red Teaming
Nevan Wichers
Carson E. Denison
Ahmad Beirami
8
25
0
30 Jan 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
30
66
0
29 Jan 2024
Prompt Packer: Deceiving LLMs through Compositional Instruction with Hidden Attacks
Shuyu Jiang
Xingshu Chen
Rui Tang
24
22
0
16 Oct 2023
Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?
Yu-Lin Tsai
Chia-Yi Hsu
Chulin Xie
Chih-Hsun Lin
Jia-You Chen
Bo-wen Li
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
DiffM
28
76
0
16 Oct 2023
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
29
158
0
25 Sep 2023
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
Zhi-Yi Chin
Chieh-Ming Jiang
Ching-Chun Huang
Pin-Yu Chen
Wei-Chen Chiu
DiffM
11
65
0
12 Sep 2023
FLIRT: Feedback Loop In-context Red Teaming
Ninareh Mehrabi
Palash Goyal
Christophe Dupuy
Qian Hu
Shalini Ghosh
R. Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
DiffM
21
55
0
08 Aug 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan
30
15
0
14 Jul 2023
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
122
183
0
03 Oct 2022
Determinantal point processes for machine learning
Alex Kulesza
B. Taskar
160
1,122
0
25 Jul 2012
A Framework for Evaluating Approximation Methods for Gaussian Process Regression
Krzysztof Chalupka
Christopher K. I. Williams
Iain Murray
GP
61
169
0
29 May 2012
1