v1v2 (latest)

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

International Conference on Learning Representations (ICLR), 2023

26 July 2023

Erfan Shayegani

Yue Dong

Nael B. Abu-Ghazaleh

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models"

50 / 162 papers shown

Misalignment or misuse? The AGI alignment tradeoffPhilosophical Studies (Philos. Stud.), 2025

Max Hellrigel-Holderbaum

Leonard Dung

281

04 Jun 2025

Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models

241

02 Jun 2025

Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual ModalitiesVolume 1 (V1), 2025

144

31 May 2025

The Security Threat of Compressed Projectors in Large Vision-Language Models

149

31 May 2025

Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack

244

28 May 2025

System Prompt Extraction Attacks and Defenses in Large Language Models

162

27 May 2025

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

414

26 May 2025

Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts

173

26 May 2025

Safety Alignment via Constrained Knowledge Unlearning

240

24 May 2025

Robustifying Vision-Language Models via Dynamic Token Reweighting

414

22 May 2025

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

302

17 May 2025

X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP

485

08 May 2025

REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLMInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

Madhur Jindal

Saurabh Deshpande

AAML

316

07 May 2025

Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation

Anjila Budathoki

Manish Dhakal

AAML

301

05 May 2025

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

...

1.1K

25 Apr 2025

Manipulating Multimodal Agents via Cross-Modal Prompt Injection

799

19 Apr 2025

QAVA: Query-Agnostic Visual Attack to Large Vision-Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

252

15 Apr 2025

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?Computer Vision and Pattern Recognition (CVPR), 2025

372

14 Apr 2025

AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender

...

391

13 Apr 2025

A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models

Carlos Peláez-González

Andrés Herrera-Poyatos

Cristina Zuheros

David Herrera-Poyatos

Virilo Tejedor

F. Herrera

AAML

254

07 Apr 2025

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

285

03 Apr 2025

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

490

02 Apr 2025

AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

349

02 Apr 2025

PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization

544

02 Apr 2025

Emerging Cyber Attack Risks of Medical AI Agents

379

02 Apr 2025

Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots

299

01 Apr 2025

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution StrategyComputer Vision and Pattern Recognition (CVPR), 2025

374

26 Mar 2025

MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks

368

24 Mar 2025

Web Artifact Attacks Disrupt Vision Language Models

305

17 Mar 2025

Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense

345

14 Mar 2025

Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization

387

14 Mar 2025

Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application

355

10 Mar 2025

TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models

361

10 Mar 2025

RedDiffuser: Red Teaming Vision-Language Models for Toxic Continuation via Reinforced Stable Diffusion

419

08 Mar 2025

CeTAD: Towards Certified Toxicity-Aware Distance in Vision Language Models

337

08 Mar 2025

CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIPComputer Vision and Pattern Recognition (CVPR), 2025

564

05 Mar 2025

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

428

28 Feb 2025

Single-pass Detection of Jailbreaking Input in Large Language Models

267

24 Feb 2025

EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models

369

24 Feb 2025

Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial ImagesInternational Conference on Learning Representations (ICLR), 2025

415

23 Feb 2025

Unified Prompt Attack Against Text-to-Image Generation ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

266

23 Feb 2025

How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation

214

21 Feb 2025

SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine UnlearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

542

18 Feb 2025

Understanding and Rectifying Safety Perception Distortion in VLMs

1.0K

18 Feb 2025

Distraction is All You Need for Multimodal Large Language Model JailbreakingComputer Vision and Pattern Recognition (CVPR), 2025

606

15 Feb 2025

Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models

488

03 Feb 2025

"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models

556

02 Feb 2025

Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

253

28 Jan 2025

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024

514

27 Nov 2024

Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters AugmentationComputer Vision and Pattern Recognition (CVPR), 2024

474

23 Nov 2024