Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.09792
Cited By
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
14 March 2024
Yifan Li
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models"
35 / 35 papers shown
Title
REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM
Madhur Jindal
Saurabh Deshpande
AAML
40
0
0
07 May 2025
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
Menglan Chen
Xianghe Pang
Jingjing Dong
Wenhao Wang
Yaxin Du
Siheng Chen
LRM
25
0
0
17 Apr 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
Yanbo Wang
Jiyang Guan
Jian Liang
Ran He
41
0
0
14 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
19
0
0
07 Apr 2025
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang
Yushen Zuo
Yuanjun Chai
Z. Liu
Yichen Fu
Yichun Feng
Kin-Man Lam
AAML
VLM
34
0
0
02 Apr 2025
PiCo: Jailbreaking Multimodal Large Language Models via
Pi
\textbf{Pi}
Pi
ctorial
Co
\textbf{Co}
Co
de Contextualization
Aofan Liu
Lulu Tang
Ting Pan
Yuguo Yin
Bin Wang
Ao Yang
MLLM
AAML
40
0
0
02 Apr 2025
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong
Seyun Bae
Yeonsung Jung
Jaeryong Hwang
Eunho Yang
AAML
43
0
0
26 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Y. Wang
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
54
0
0
24 Mar 2025
Survey of Adversarial Robustness in Multimodal Large Language Models
Chengze Jiang
Zhuangzhuang Wang
Minjing Dong
Jie Gui
AAML
58
0
0
18 Mar 2025
Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization
Shuyang Hao
Yiwei Wang
Bryan Hooi
J. Liu
Muhao Chen
Zi Huang
Yujun Cai
AAML
VLM
63
0
0
14 Mar 2025
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense
Shuyang Hao
Y. Wang
Bryan Hooi
Ming Yang
J. Liu
Chengcheng Tang
Zi Huang
Yujun Cai
AAML
50
0
0
14 Mar 2025
Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs
Wenzhuo Xu
Zhipeng Wei
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
X. Zhang
AAML
47
0
0
10 Mar 2025
FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts
Ziyi Zhang
Zhen Sun
Z. Zhang
Jihui Guo
Xinlei He
AAML
39
2
0
28 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
87
0
0
18 Feb 2025
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang
Jiluan Fan
Anli Yan
Erdun Gao
Xin Lin
Tao Li
Kanghua mo
Changyu Dong
AAML
70
0
0
15 Feb 2025
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
H. Malik
Fahad Shamshad
Muzammal Naseer
Karthik Nandakumar
F. Khan
Salman Khan
AAML
MLLM
VLM
64
0
0
03 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
55
1
0
02 Feb 2025
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Abdulkadir Erol
Trilok Padhi
Agnik Saha
Ugur Kursuncu
Mehmet Emin Aktas
35
0
0
17 Jan 2025
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
Shuyang Hao
Bryan Hooi
J. Liu
Kai-Wei Chang
Zi Huang
Yujun Cai
AAML
84
0
0
27 Nov 2024
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach
Ruoxi Sun
Jiamin Chang
Hammond Pearce
Chaowei Xiao
B. Li
Qi Wu
Surya Nepal
Minhui Xue
27
0
0
17 Nov 2024
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Xuannan Liu
Xing Cui
Peipei Li
Zekun Li
Huaibo Huang
Shuhan Xia
Miaoxuan Zhang
Yueying Zou
Ran He
AAML
51
4
0
14 Nov 2024
Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models
Qin Liu
Chao Shang
Ling Liu
Nikolaos Pappas
Jie Ma
Neha Anna John
Srikanth Doss Kadarundalagi Raghuram Doss
Lluís Marquez
Miguel Ballesteros
Yassine Benajiba
21
3
0
11 Oct 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAML
VLM
22
3
0
11 Sep 2024
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
Daizong Liu
Mingyu Yang
Xiaoye Qu
Pan Zhou
Yu Cheng
Wei Hu
ELM
AAML
27
21
0
10 Jul 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
58
22
0
17 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
26
10
0
13 Jun 2024
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character
Siyuan Ma
Weidi Luo
Yu Wang
Xiaogeng Liu
25
20
0
25 May 2024
Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples
Kuofeng Gao
Jindong Gu
Yang Bai
Shu-Tao Xia
Philip H. S. Torr
Wei Liu
Zhifeng Li
48
11
0
25 Apr 2024
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security
Yihe Fan
Yuxin Cao
Ziyu Zhao
Ziyao Liu
Shaofeng Li
21
11
0
08 Apr 2024
Red Teaming Visual Language Models
Mukai Li
Lei Li
Yuwei Yin
Masood Ahmed
Zhenguang Liu
Qi Liu
VLM
25
11
0
23 Jan 2024
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
127
116
0
09 Nov 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
90
84
0
21 Aug 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1