ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.04237
  4. Cited By
Black Box Adversarial Prompting for Foundation Models

Black Box Adversarial Prompting for Foundation Models

8 February 2023
N. Maus
Patrick Chao
Eric Wong
Jacob R. Gardner
    VLM
ArXivPDFHTML

Papers citing "Black Box Adversarial Prompting for Foundation Models"

50 / 52 papers shown
Title
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Shashank Kapoor
Sanjay Surendranath Girija
Lakshit Arora
Dipen Pradhan
Ankit Shetgaonkar
Aman Raj
AAML
69
0
0
06 May 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
Siyi Chen
Yimeng Zhang
Sijia Liu
Q. Qu
AAML
138
0
0
30 Apr 2025
Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models
Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models
J. Liu
Zhaoxin Wang
Handing Wang
Cong Tian
Yaochu Jin
26
0
0
15 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
19
0
0
07 Apr 2025
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Qiang Wang
Dawei Feng
Xu Zhang
Ao Shen
Yang Xu
Bo Ding
H. Wang
AAML
46
0
0
31 Mar 2025
Augmented Adversarial Trigger Learning
Augmented Adversarial Trigger Learning
Zhe Wang
Yanjun Qi
53
0
0
16 Mar 2025
Jailbreaking to Jailbreak
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
194
1
0
09 Feb 2025
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt
  Decomposition Process
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Peiran Wang
Xiaogeng Liu
Chaowei Xiao
AAML
29
3
0
11 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
39
8
0
09 Oct 2024
From Transparency to Accountability and Back: A Discussion of Access and
  Evidence in AI Auditing
From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI Auditing
Sarah H. Cen
Rohan Alur
29
1
0
07 Oct 2024
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models
Hongxiang Zhang
Yifeng He
Hao Chen
23
2
0
03 Oct 2024
Adversarial Attacks on Parts of Speech: An Empirical Study in
  Text-to-Image Generation
Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation
G M Shahariar
Jia Chen
Jiachen Li
Yue Dong
29
0
0
21 Sep 2024
Acceptable Use Policies for Foundation Models
Acceptable Use Policies for Foundation Models
Kevin Klyman
31
14
0
29 Aug 2024
LLM-PBE: Assessing Data Privacy in Large Language Models
LLM-PBE: Assessing Data Privacy in Large Language Models
Qinbin Li
Junyuan Hong
Chulin Xie
Jeffrey Tan
Rachel Xin
...
Dan Hendrycks
Zhangyang Wang
Bo Li
Bingsheng He
Dawn Song
ELM
PILM
38
12
0
23 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
38
8
0
02 Aug 2024
Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A
  Survey
Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey
Chenyu Zhang
Mingwang Hu
Wenhui Li
Lanjun Wang
39
15
0
10 Jul 2024
Serial Position Effects of Large Language Models
Serial Position Effects of Large Language Models
Xiaobo Guo
Soroush Vosoughi
43
3
0
23 Jun 2024
Security of AI Agents
Security of AI Agents
Yifeng He
Ethan Wang
Yuyang Rong
Zifei Cheng
Hao Chen
LLMAG
34
7
0
12 Jun 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
35
17
0
03 Jun 2024
Automatic Jailbreaking of the Text-to-Image Generative AI Systems
Automatic Jailbreaking of the Text-to-Image Generative AI Systems
Minseon Kim
Hyomin Lee
Boqing Gong
Huishuai Zhang
Sung Ju Hwang
32
11
0
26 May 2024
Defensive Unlearning with Adversarial Training for Robust Concept
  Erasure in Diffusion Models
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
Yimeng Zhang
Xin Chen
Jinghan Jia
Yihua Zhang
Chongyu Fan
Jiancheng Liu
Mingyi Hong
Ke Ding
Sijia Liu
DiffM
36
52
0
24 May 2024
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
55
55
0
21 Apr 2024
Ethical Framework for Responsible Foundational Models in Medical Imaging
Ethical Framework for Responsible Foundational Models in Medical Imaging
Abhijit Das
Debesh Jha
Jasmer Sanjotra
Onkar Susladkar
Suramyaa Sarkar
A. Rauniyar
Nikhil Tomar
Vanshali Sharma
Ulas Bagci
MedIm
77
0
0
14 Apr 2024
A Safe Harbor for AI Evaluation and Red Teaming
A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre
Sayash Kapoor
Kevin Klyman
Ashwin Ramaswami
Rishi Bommasani
...
Daniel Kang
Sandy Pentland
Arvind Narayanan
Percy Liang
Peter Henderson
49
38
0
07 Mar 2024
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
Yifan Zeng
Yiran Wu
Xiao Zhang
Huazheng Wang
Qingyun Wu
LLMAG
AAML
42
59
0
02 Mar 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
70
62
0
26 Feb 2024
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion
  Models against Stochastic Perturbation
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
Yi Zhang
Yun Tang
Wenjie Ruan
Xiaowei Huang
Siddartha Khastgir
P. Jennings
Xingyu Zhao
AAML
27
4
0
23 Feb 2024
Coercing LLMs to do and reveal (almost) anything
Coercing LLMs to do and reveal (almost) anything
Jonas Geiping
Alex Stein
Manli Shu
Khalid Saifullah
Yuxin Wen
Tom Goldstein
AAML
34
43
0
21 Feb 2024
A Bayesian approach for prompt optimization in pre-trained language
  models
A Bayesian approach for prompt optimization in pre-trained language models
Antonio Sabbatella
Andrea Ponti
Antonio Candelieri
I. Giordani
F. Archetti
23
1
0
01 Dec 2023
MMA-Diffusion: MultiModal Attack on Diffusion Models
MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang
Ruiyuan Gao
Xiaosen Wang
Tsung-Yi Ho
Nan Xu
Qiang Xu
27
61
0
29 Nov 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Dongxiao Zhu
30
32
0
16 Nov 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
58
0
0
16 Nov 2023
Do LLMs exhibit human-like response biases? A case study in survey
  design
Do LLMs exhibit human-like response biases? A case study in survey design
Lindia Tjuatja
Valerie Chen
Sherry Tongshuang Wu
Ameet Talwalkar
Graham Neubig
27
80
0
07 Nov 2023
Can LLMs Follow Simple Rules?
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David A. Wagner
ALM
23
26
0
06 Nov 2023
Joint Composite Latent Space Bayesian Optimization
Joint Composite Latent Space Bayesian Optimization
Natalie Maus
Zhiyuan Jerry Lin
Maximilian Balandat
E. Bakshy
BDL
33
2
0
03 Nov 2023
Foundational Models in Medical Imaging: A Comprehensive Survey and
  Future Vision
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Bobby Azad
Reza Azad
Sania Eskandari
Afshin Bozorgpour
A. Kazerouni
I. Rekik
Dorit Merhof
VLM
MedIm
98
59
0
28 Oct 2023
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large
  Language Models
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu
Ruiyi Zhang
Bang An
Gang Wu
Joe Barrow
Zichao Wang
Furong Huang
A. Nenkova
Tong Sun
SILM
AAML
30
40
0
23 Oct 2023
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still
  Easy To Generate Unsafe Images ... For Now
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Yimeng Zhang
Jinghan Jia
Xin Chen
Aochuan Chen
Yihua Zhang
Jiancheng Liu
Ke Ding
Sijia Liu
DiffM
22
82
0
18 Oct 2023
Jailbreaking Black Box Large Language Models in Twenty Queries
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao
Alexander Robey
Edgar Dobriban
Hamed Hassani
George J. Pappas
Eric Wong
AAML
53
570
0
12 Oct 2023
Low-Resource Languages Jailbreak GPT-4
Low-Resource Languages Jailbreak GPT-4
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
SILM
23
169
0
03 Oct 2023
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by
  Finding Problematic Prompts
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
Zhi-Yi Chin
Chieh-Ming Jiang
Ching-Chun Huang
Pin-Yu Chen
Wei-Chen Chiu
DiffM
11
65
0
12 Sep 2023
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Raz Lapid
Ron Langberg
Moshe Sipper
AAML
16
103
0
04 Sep 2023
Effective Prompt Extraction from Language Models
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACV
SILM
30
35
0
13 Jul 2023
Discovering the Hidden Vocabulary of DALLE-2
Discovering the Hidden Vocabulary of DALLE-2
Giannis Daras
A. Dimakis
129
64
0
01 Jun 2022
PromptSource: An Integrated Development Environment and Repository for
  Natural Language Prompts
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen H. Bach
Victor Sanh
Zheng-Xin Yong
Albert Webson
Colin Raffel
...
Khalid Almubarak
Xiangru Tang
Dragomir R. Radev
Mike Tian-Jian Jiang
Alexander M. Rush
VLM
225
338
0
02 Feb 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,120
0
18 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,777
0
24 Feb 2021
What Makes Good In-Context Examples for GPT-$3$?
What Makes Good In-Context Examples for GPT-333?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
275
1,312
0
17 Jan 2021
Certified Robustness to Adversarial Word Substitutions
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
183
290
0
03 Sep 2019
Generating Natural Language Adversarial Examples
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
245
914
0
21 Apr 2018
12
Next