v1v2 (latest)

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

International Conference on Learning Representations (ICLR), 2023

26 July 2023

Erfan Shayegani

Yue Dong

Nael B. Abu-Ghazaleh

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models"

50 / 161 papers shown

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

538

14 Nov 2024

New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook

309

12 Nov 2024

Layer-wise Alignment: Examining Safety Alignment Across Image Encoder Layers in Vision Language Models

Amit K. Roy-Chowdhury

278

06 Nov 2024

UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models

342

03 Nov 2024

Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models

Ehsan Shareghi

236

31 Oct 2024

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

164

30 Oct 2024

CLEAR: Character Unlearning in Textual and Visual ModalitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

527

23 Oct 2024

Bayesian scaling laws for in-context learning

513

21 Oct 2024

Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images

Arka Daw

Megan Hong-Thanh Chung

Maria Mahbub

Amir Sadovnik

AAML

261

16 Oct 2024

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video GenerationInternational Conference on Learning Representations (ICLR), 2024

674

16 Oct 2024

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language ModelsACM Multimedia (MM), 2024

Haoyu Cao

154

09 Oct 2024

You Know What I'm Saying: Jailbreak Attack via Implicit Reference

Tianyu Wu

Lingrui Mei

Ruibin Yuan

Lujun Li

Wei Xue

Yike Guo

223

04 Oct 2024

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

843

04 Oct 2024

FlipAttack: Jailbreak LLMs via Flipping

Yue Liu

Miao Xiong

Bryan Hooi

244

02 Oct 2024

VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data

Ahmed Salem

Yixuan Li

222

01 Oct 2024

Multimodal Pragmatic Jailbreak on Text-to-image ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

313

27 Sep 2024

CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration

Lanqing Hong

Xin Jiang

Zhenguo Li

308

17 Sep 2024

Building and better understanding vision-language models: insights and future directions

Hugo Laurençon

317

132

22 Aug 2024

$$\textit{MMJ-Bench}$: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models$

\textit{MMJ-Bench}

: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models

233

16 Aug 2024

Mission Impossible: A Statistical Perspective on Jailbreaking LLMsNeural Information Processing Systems (NeurIPS), 2024

Jingtong Su

Mingyu Lee

SangKeun Lee

210

02 Aug 2024

Defending Jailbreak Attack in VLMs via Cross-modality Information Detector

246

31 Jul 2024

The Emerged Security and Privacy of LLM Agent: A Survey with Case StudiesACM Computing Surveys (ACM CSUR), 2024

462

28 Jul 2024

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

Xiaoye Qu

Wei Hu

344

10 Jul 2024

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

421

26 Jun 2024

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Siyuan Wang

Zhuohan Long

Zhihao Fan

Zhongyu Wei

220

21 Jun 2024

"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak

Lingrui Mei

Shenghua Liu

208

17 Jun 2024

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

...

518

17 Jun 2024

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Zhao Xu

Fan Liu

Hao Liu

AAML

274

13 Jun 2024

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

482

13 Jun 2024

Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs

Fan Liu

Zhao Xu

Hao Liu

AAML

258

07 Jun 2024

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

386

06 Jun 2024

Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

477

27 May 2024

Cross-Modal Safety Alignment: Is textual unlearning all you need?

Amit K. Roy-Chowdhury

Chengyu Song

242

27 May 2024

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

Siyuan Ma

Weidi Luo

Yu Wang

Xiaogeng Liu

364

25 May 2024

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

228

17 May 2024

What matters when building vision-language models?Neural Information Processing Systems (NeurIPS), 2024

302

276

03 May 2024

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models

339

12 Apr 2024

Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

226

08 Apr 2024

As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?

Anjun Hu

Jindong Gu

Francesco Pinto

Konstantinos Kamnitsas

Juil Sock

AAML SILM

257

19 Mar 2024

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

...

Xuanjing Huang

231

18 Mar 2024

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2024

497

14 Mar 2024

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

Qi Liu

393

05 Mar 2024

Accelerating Greedy Coordinate Gradient via Probe Sampling

317

02 Mar 2024

Coercing LLMs to do and reveal (almost) anything

238

21 Feb 2024

The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Chengshuai Zhao

Huan Liu

194

20 Feb 2024

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Matthias Hein

389

19 Feb 2024

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

Zeyi Liao

Huan Sun

291

15 Feb 2024

Test-Time Backdoor Attacks on Multimodal Large Language Models

383

13 Feb 2024

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

232

13 Feb 2024

Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy

255

08 Feb 2024