ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.14539
  4. Cited By
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal
  Language Models
v1v2 (latest)

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

International Conference on Learning Representations (ICLR), 2023
26 July 2023
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models"

50 / 161 papers shown
Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot
Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot
Sheng Hang
Chaoxiang He
Hongsheng Hu
Hanqing Hu
B. Zhu
Shi-Feng Sun
Dawu Gu
Shuo Wang
137
0
0
04 Dec 2025
DefenSee: Dissecting Threat from Sight and Text - A Multi-View Defensive Pipeline for Multi-modal Jailbreaks
DefenSee: Dissecting Threat from Sight and Text - A Multi-View Defensive Pipeline for Multi-modal Jailbreaks
Zihao Wang
K. Fok
V. Thing
AAML
158
0
0
01 Dec 2025
Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings
Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings
Fatemeh Akbarian
Anahita Baninajjar
Yingyi Zhang
Ananth Balashankar
Amir Aminifar
AAML
199
0
0
26 Nov 2025
GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision
GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision
Yuxiao Xiang
Junchi Chen
Zhenchao Jin
Changtao Miao
Haojie Yuan
Qi Chu
Tao Gong
Nenghai Yu
LRM
201
0
0
26 Nov 2025
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
Naifu Zhang
Wei Tao
Xi Xiao
Qianpu Sun
Yuxin Zheng
Wentao Mo
Peiqiang Wang
Nan Zhang
AAMLVLM
790
0
0
26 Nov 2025
On the Feasibility of Hijacking MLLMs' Decision Chain via One Perturbation
On the Feasibility of Hijacking MLLMs' Decision Chain via One Perturbation
Changyue Li
Jiaying Li
Youliang Yuan
Jiaming He
Zhicong Huang
Pinjia He
AAML
242
0
0
25 Nov 2025
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
Sen Nie
Jie M. Zhang
Jianxin Yan
Shiguang Shan
Xilin Chen
AAML
290
0
0
25 Nov 2025
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
Xuankun Rong
Wenke Huang
Tingfeng Wang
Daiguo Zhou
Bo Du
Mang Ye
LRM
230
0
0
17 Nov 2025
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
Juan Ren
Mark Dras
Usman Naseem
AAML
108
1
0
29 Oct 2025
Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
Xingwei Zhong
K. Fok
V. Thing
AAML
152
0
0
24 Oct 2025
VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models
VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models
Qilin Liao
Anamika Lochab
Ruqi Zhang
AAMLVLM
196
0
0
20 Oct 2025
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
Xu-Yao Zhang
Hao Li
Zhichao Lu
AAML
107
0
0
20 Oct 2025
Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks
Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks
ChenYu Wu
Yi Wang
Yang Liao
143
0
0
16 Oct 2025
SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
Juan Ren
Mark Dras
Usman Naseem
AAML
176
2
0
15 Oct 2025
Locket: Robust Feature-Locking Technique for Language Models
Locket: Robust Feature-Locking Technique for Language Models
Lipeng He
Vasisht Duddu
Nadarajah Asokan
98
0
0
14 Oct 2025
SafeMT: Multi-turn Safety for Multimodal Language Models
SafeMT: Multi-turn Safety for Multimodal Language Models
Han Zhu
Juntao Dai
Jiaming Ji
Haoran Li
Chengkun Cai
...
Chi-Min Chan
Boyuan Chen
Yaodong Yang
Sirui Han
Yike Guo
147
2
0
14 Oct 2025
Deep Research Brings Deeper Harm
Deep Research Brings Deeper Harm
Shuo Chen
Zonggen Li
Zhen Han
Bailan He
Tong Liu
Haokun Chen
Georg Groh
Philip Torr
Volker Tresp
Jindong Gu
172
0
0
13 Oct 2025
Multimodal Safety Evaluation in Generative Agent Social Simulations
Multimodal Safety Evaluation in Generative Agent Social Simulations
Alhim Vera
Karen Sanchez
Carlos Hinojosa
Haidar Bin Hamid
Donghoon Kim
Bernard Ghanem
LLMAGEGVM
161
1
0
09 Oct 2025
VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands
VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands
Aofan Liu
Lulu Tang
MLLMVLM
242
0
0
09 Oct 2025
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Erfan Shayegani
Keegan Hines
Yue Dong
Nael B. Abu-Ghazaleh
Roman Lutz
Spencer Whitehead
Vidhisha Balachandran
Besmira Nushi
Vibhav Vineet
148
0
0
02 Oct 2025
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Isha Gupta
Rylan Schaeffer
Joshua Kazdan
Katja Filippova
Sanmi Koyejo
OODAAML
290
1
0
01 Oct 2025
Backdoor Attacks Against Speech Language Models
Backdoor Attacks Against Speech Language Models
Alexandrine Fortier
Thomas Thebaud
Jesus Villalba
Najim Dehak
P. Cardinal
AuLLM
285
0
0
01 Oct 2025
AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
Shangding Gu
Xiaohan Wang
Donghao Ying
Haoyu Zhao
Runing Yang
...
Marco Pavone
Serena Yeung-Levy
Jun Wang
Dawn Song
C. Spanos
117
0
0
30 Sep 2025
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
Ravikumar Balakrishnan
Mansi Phute
LLMSV
163
1
0
29 Sep 2025
WAREX: Web Agent Reliability Evaluation on Existing Benchmarks
WAREX: Web Agent Reliability Evaluation on Existing Benchmarks
Su Kara
Fazle Faisal
Suman Nath
178
0
0
28 Sep 2025
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Francesco Marchiori
Rohan Sinha
Christopher Agia
Alexander Robey
George Pappas
Mauro Conti
Marco Pavone
AAML
130
0
0
27 Sep 2025
FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
Runqi Lin
Alasdair Paren
Suqin Yuan
Muyang Li
Juil Sock
Adel Bibi
Tongliang Liu
AAML
214
0
0
25 Sep 2025
JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation
JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation
Md Jueal Mia
M. Hadi Amini
AAMLVLM
242
0
0
24 Sep 2025
Steering Multimodal Large Language Models Decoding for Context-Aware Safety
Steering Multimodal Large Language Models Decoding for Context-Aware Safety
Zheyuan Liu
Zhangchen Xu
Guangyao Dou
Xiangchi Yuan
Zhaoxuan Tan
Radha Poovendran
Meng Jiang
148
1
0
23 Sep 2025
AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
Debdeep Sanyal
Manodeep Ray
Murari Mandal
AAML
188
0
0
06 Sep 2025
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios
Jingen Qu
L. Li
Bo Zhang
Yichen Yan
Jing Shao
128
1
0
04 Sep 2025
On Surjectivity of Neural Networks: Can you elicit any behavior from your model?
On Surjectivity of Neural Networks: Can you elicit any behavior from your model?
Haozhe Jiang
Nika Haghtalab
190
3
0
26 Aug 2025
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas
Mao Nishino
Samuel Jacob Chacko
Xiuwen Liu
AAML
148
2
0
20 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAMLSILM
186
0
0
14 Aug 2025
Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
Yutong Wu
Jie Zhang
Yiming Li
Chao Zhang
Qing Guo
Nils Lukas
Tianwei Zhang
AAML
163
0
0
12 Aug 2025
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Mansi Phute
Ravikumar Balakrishnan
LLMSV
92
0
0
11 Aug 2025
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Shuang Liang
Zhihao Xu
Jialing Tao
Hui Xue
Xiting Wang
AAML
186
0
0
08 Aug 2025
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
Chi Zhang
Changjia Zhu
Junjie Xiong
Xiaoran Xu
Jinkui Chi
Yao Liu
Zhuo Lu
ELM
203
4
0
07 Aug 2025
JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
Renmiao Chen
Shiyao Cui
X. Y. Huang
Chengwei Pan
Victor Shea-Jay Huang
Qinglin Zhang
Xuan Ouyang
Zhexin Zhang
Huaimin Wang
Shiyu Huang
AAML
123
3
0
07 Aug 2025
Adversarial-Guided Diffusion for Multimodal LLM Attacks
Adversarial-Guided Diffusion for Multimodal LLM Attacks
Chengwei Xia
Fan Ma
Ruijie Quan
Kun Zhan
Yi Yang
DiffM
196
1
0
31 Jul 2025
Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models
Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models
Wanying Wang
Zeyu Ma
Han Zheng
Xin Tan
Mingang Chen
150
0
0
29 Jul 2025
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures
Dezhang Kong
Shi Lin
Zhenhua Xu
Z. J. Wang
Minghao Li
...
Ningyu Zhang
Chaochao Chen
Chunming Wu
Muhammad Khurram Khan
Meng Han
LLMAG
341
27
0
24 Jun 2025
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Lei Jiang
Zixun Zhang
Zizhou Wang
Xiaobing Sun
Zhen Li
Liangli Zhen
Xiaohua Xu
AAML
232
2
0
20 Jun 2025
VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service
VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a ServiceAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
X. Wang
Tianliang Yao
S. Chen
Runqi Wang
Lei YE
Kuofeng Gao
Yi Huang
Yuan Yao
VLM
185
1
0
18 Jun 2025
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Yanxu Mao
Tiehan Cui
Peipei Liu
Datao You
Hongsong Zhu
AAML
342
4
0
18 Jun 2025
Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025
Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025
Zonghao Ying
Siyang Wu
Run Hao
Peng Ying
Shixuan Sun
...
Xianglong Liu
Dawn Song
Yaoyao Liu
Juil Sock
Dacheng Tao
280
10
0
14 Jun 2025
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
Yitong Zhang
Jia Li
L. Cai
Ge Li
VLM
325
3
0
11 Jun 2025
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Kun Zhang
Le Wu
Kui Yu
Guangyi Lv
Dacao Zhang
AAMLELM
338
1
0
08 Jun 2025
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques
Jisu An
Junseok Lee
Jeoungeun Lee
Yongseok Son
441
2
0
05 Jun 2025
Misalignment or misuse? The AGI alignment tradeoff
Misalignment or misuse? The AGI alignment tradeoffPhilosophical Studies (Philos. Stud.), 2025
Max Hellrigel-Holderbaum
Leonard Dung
277
2
0
04 Jun 2025
1234
Next