v1v2 (latest)

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

International Conference on Learning Representations (ICLR), 2023

26 July 2023

Erfan Shayegani

Yue Dong

Nael B. Abu-Ghazaleh

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models"

50 / 161 papers shown

Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot

137

04 Dec 2025

DefenSee: Dissecting Threat from Sight and Text - A Multi-View Defensive Pipeline for Multi-modal Jailbreaks

158

01 Dec 2025

Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

199

26 Nov 2025

GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

201

26 Nov 2025

Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models

790

26 Nov 2025

On the Feasibility of Hijacking MLLMs' Decision Chain via One Perturbation

242

25 Nov 2025

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

290

25 Nov 2025

SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

230

17 Nov 2025

Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models

108

29 Oct 2025

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

152

24 Oct 2025

VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models

196

20 Oct 2025

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

107

20 Oct 2025

Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks

ChenYu Wu

Yi Wang

Yang Liao

143

16 Oct 2025

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs

176

15 Oct 2025

Locket: Robust Feature-Locking Technique for Language Models

Lipeng He

Vasisht Duddu

Nadarajah Asokan

14 Oct 2025

SafeMT: Multi-turn Safety for Multimodal Language Models

...

147

14 Oct 2025

Deep Research Brings Deeper Harm

172

13 Oct 2025

Multimodal Safety Evaluation in Generative Agent Social Simulations

161

09 Oct 2025

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

Aofan Liu

Lulu Tang

MLLM VLM

242

09 Oct 2025

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Vidhisha Balachandran

Besmira Nushi

Vibhav Vineet

148

02 Oct 2025

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

290

01 Oct 2025

Backdoor Attacks Against Speech Language Models

285

01 Oct 2025

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

...

117

30 Sep 2025

VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models

Ravikumar Balakrishnan

Mansi Phute

LLMSV

163

29 Sep 2025

WAREX: Web Agent Reliability Evaluation on Existing Benchmarks

Su Kara

Fazle Faisal

Suman Nath

178

28 Sep 2025

Preventing Robotic Jailbreaking via Multimodal Domain Adaptation

130

27 Sep 2025

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

214

25 Sep 2025

JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation

Md Jueal Mia

M. Hadi Amini

AAML VLM

242

24 Sep 2025

Steering Multimodal Large Language Models Decoding for Context-Aware Safety

148

23 Sep 2025

AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs

188

06 Sep 2025

Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios

128

04 Sep 2025

On Surjectivity of Neural Networks: Can you elicit any behavior from your model?

Haozhe Jiang

Nika Haghtalab

190

26 Aug 2025

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent

148

20 Aug 2025

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

186

14 Aug 2025

Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

163

12 Aug 2025

VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models

Mansi Phute

Ravikumar Balakrishnan

LLMSV

11 Aug 2025

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models

186

08 Aug 2025

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM

203

07 Aug 2025

JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering

Victor Shea-Jay Huang

123

07 Aug 2025

Adversarial-Guided Diffusion for Multimodal LLM Attacks

196

31 Jul 2025

Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models

150

29 Jul 2025

A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures

...

Muhammad Khurram Khan

Meng Han

LLMAG

341

24 Jun 2025

Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models

232

20 Jun 2025

VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a ServiceAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

185

18 Jun 2025

From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem

342

18 Jun 2025

Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

...

280

14 Jun 2025

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

325

11 Jun 2025

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

338

08 Jun 2025

Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques

441

05 Jun 2025

Misalignment or misuse? The AGI alignment tradeoffPhilosophical Studies (Philos. Stud.), 2025

Max Hellrigel-Holderbaum

Leonard Dung

277

04 Jun 2025