Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.14539
Cited By
v1
v2 (latest)
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
International Conference on Learning Representations (ICLR), 2023
26 July 2023
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models"
50 / 162 papers shown
Misalignment or misuse? The AGI alignment tradeoff
Philosophical Studies (Philos. Stud.), 2025
Max Hellrigel-Holderbaum
Leonard Dung
281
2
0
04 Jun 2025
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
Youze Wang
Wenbo Hu
Yinpeng Dong
Jing Liu
Hanwang Zhang
Richang Hong
241
10
0
02 Jun 2025
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
Volume 1 (V1), 2025
Fauzan Farooqui
Thy Thy Tran
Preslav Nakov
Iryna Gurevych
MLLM
AAML
144
0
0
31 May 2025
The Security Threat of Compressed Projectors in Large Vision-Language Models
Yudong Zhang
Ruobing Xie
Xingwu Sun
Jiansheng Chen
Zhanhui Kang
Di Wang
Yu Wang
149
0
0
31 May 2025
Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack
Juan Ren
Mark Dras
Usman Naseem
AAML
244
2
0
28 May 2025
System Prompt Extraction Attacks and Defenses in Large Language Models
B. Das
M. H. Amini
Yanzhao Wu
AAML
162
4
0
27 May 2025
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
Jiaxin Song
Yixu Wang
Jie Li
Rui Yu
Yan Teng
Jiabo He
Yingchun Wang
AAML
414
2
0
26 May 2025
Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts
H. Kim
Minbeom Kim
Wonjun Lee
Kihyun Kim
Changick Kim
173
0
0
26 May 2025
Safety Alignment via Constrained Knowledge Unlearning
Zesheng Shi
Yucheng Zhou
Jing Li
MU
KELM
AAML
240
5
0
24 May 2025
Robustifying Vision-Language Models via Dynamic Token Reweighting
Tanqiu Jiang
Jiacheng Liang
Rongyi Zhu
Jiawei Zhou
Fenglong Ma
Ting Wang
AAML
414
2
0
22 May 2025
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Xue Sun
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVM
AAML
302
7
0
17 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Jiabo He
James Bailey
AAML
485
9
0
08 May 2025
REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Madhur Jindal
Saurabh Deshpande
AAML
316
1
0
07 May 2025
Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation
Anjila Budathoki
Manish Dhakal
AAML
301
1
0
05 May 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Jing Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Yongbin Li
Jun Zhu
MLLM
1.1K
4
0
25 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Le Wang
Zonghao Ying
Tianyuan Zhang
Yaning Tan
Shengshan Hu
Mingchuan Zhang
A. Liu
Xianglong Liu
AAML
799
20
0
19 Apr 2025
QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yudong Zhang
Ruobing Xie
Jiansheng Chen
Xingwu Sun
Zhanhui Kang
Yu Wang
AAML
252
3
0
15 Apr 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
Computer Vision and Pattern Recognition (CVPR), 2025
Yanbo Wang
Jiyang Guan
Jian Liang
Ran He
372
5
0
14 Apr 2025
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
Weixiang Zhao
Jiahe Guo
Yulin Hu
Yang Deng
An Zhang
...
Xinyang Han
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
LLMSV
AAML
391
12
0
13 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
254
1
0
07 Apr 2025
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Yifan Wang
Runjin Chen
Bolian Li
David Cho
Yihe Deng
Ruqi Zhang
Tianlong Chen
Zhangyang Wang
A. Grama
Junyuan Hong
SyDa
285
4
0
03 Apr 2025
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang
Yushen Zuo
Yuanjun Chai
Ziqiang Liu
Yichen Fu
Yichun Feng
Kin-Man Lam
AAML
VLM
490
0
0
02 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
349
3
0
02 Apr 2025
PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization
Aofan Liu
Lulu Tang
Ting Pan
Yuguo Yin
Bin Wang
Ao Yang
MLLM
AAML
544
5
0
02 Apr 2025
Emerging Cyber Attack Risks of Medical AI Agents
Jianing Qiu
Lin Li
Jiankai Sun
Hao Wei
Zhe Xu
K. Lam
Wu Yuan
AAML
379
8
0
02 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
299
0
0
01 Apr 2025
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Computer Vision and Pattern Recognition (CVPR), 2025
Joonhyun Jeong
Seyun Bae
Yeonsung Jung
Jaeryong Hwang
Eunho Yang
AAML
374
17
0
26 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Longji Xu
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
368
4
0
24 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
305
2
0
17 Mar 2025
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense
Shuyang Hao
Yijiao Wang
Bryan Hooi
Ming Yang
Qingbin Liu
Chengcheng Tang
Zi Huang
Yujun Cai
AAML
345
1
0
14 Mar 2025
Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization
Shuyang Hao
Yiwei Wang
Bryan Hooi
Qingbin Liu
Muhao Chen
Zi Huang
Yujun Cai
AAML
VLM
387
1
0
14 Mar 2025
Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application
Wenzhuo Xu
Zhipeng Wei
Xiongtao Sun
Zonghao Ying
Deyue Zhang
Dongdong Yang
Xinming Zhang
Quanchen Zou
AAML
355
0
0
10 Mar 2025
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models
Ruidong Chen
Honglin Guo
Lanjun Wang
Chenyu Zhang
Weizhi Nie
An-an Liu
DiffM
361
6
0
10 Mar 2025
RedDiffuser: Red Teaming Vision-Language Models for Toxic Continuation via Reinforced Stable Diffusion
Ruofan Wang
Xiang Zheng
Xinyu Wang
Cong Wang
Jie Zhang
Yu-Gang Jiang
VLM
419
0
0
08 Mar 2025
CeTAD: Towards Certified Toxicity-Aware Distance in Vision Language Models
Xiangyu Yin
Jiaxu Liu
Zhen Chen
Jinwei Hu
Yi Dong
Xiaowei Huang
Wenjie Ruan
AAML
337
0
0
08 Mar 2025
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
Computer Vision and Pattern Recognition (CVPR), 2025
Songlong Xing
Zhengyu Zhao
Andrii Zadaianchuk
AAML
564
11
0
05 Mar 2025
FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts
Ziyi Zhang
Zhen Sun
Zheng Zhang
Jihui Guo
Xinlei He
AAML
428
5
0
28 Feb 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
Volkan Cevher
AAML
267
5
0
24 Feb 2025
EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models
Nastaran Darabi
Devashri Naik
Sina Tayebati
Dinithi Jayasuriya
Ranganath Krishnan
A. R. Trivedi
AAML
369
2
0
24 Feb 2025
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
International Conference on Learning Representations (ICLR), 2025
Yubo Wang
Jianting Tang
Chaohu Liu
Linli Xu
AAML
415
4
0
23 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
Jing Liu
266
4
0
23 Feb 2025
How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation
Zhuohang Long
Siyuan Wang
Shujun Liu
Yuhang Lai
Xuanjing Huang
Zhongyu Wei
AAML
214
0
0
21 Feb 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Junkai Chen
Zhijie Deng
Kening Zheng
Yibo Yan
Qi Zheng
PeiJun Wu
Peijie Jiang
Qingbin Liu
Xuming Hu
MU
542
19
0
18 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
1.0K
4
0
18 Feb 2025
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Computer Vision and Pattern Recognition (CVPR), 2025
Zuopeng Yang
Jiluan Fan
Anli Yan
Erdun Gao
Xin Lin
Tao Li
Kanghua mo
Changyu Dong
AAML
606
19
0
15 Feb 2025
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
H. Malik
Fahad Shamshad
Muzammal Naseer
Karthik Nandakumar
Fahad Shahbaz Khan
Salman Khan
AAML
MLLM
VLM
488
8
0
03 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
556
5
0
02 Feb 2025
Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Yining Wang
Mi Zhang
Junjie Sun
Chenyue Wang
Min Yang
Hui Xue
Jialing Tao
Ranjie Duan
Qingbin Liu
253
6
0
28 Jan 2025
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Shuyang Hao
Bryan Hooi
Qingbin Liu
Kai-Wei Chang
Zi Huang
Yujun Cai
AAML
514
5
0
27 Nov 2024
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Fengfan Zhou
Bangjie Yin
Hefei Ling
Qianyu Zhou
Wenxuan Wang
AAML
474
0
0
23 Nov 2024
Previous
1
2
3
4
Next
Page 2 of 4