White-box Multimodal Jailbreaks Against Large Vision-Language Models

White-box Multimodal Jailbreaks Against Large Vision-Language Models

28 May 2024

Ruofan Wang

Papers citing "White-box Multimodal Jailbreaks Against Large Vision-Language Models"

4 / 4 papers shown

Title
TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models Xiangyu Yin Yi Qi Jinwei Hu Zhen Chen Yi Dong Xingyu Zhao Xiaowei Huang Wenjie Ruan 43 0 0 13 Mar 2025
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks Yunhan Zhao Xiang Zheng Lin Luo Yige Li Xingjun Ma Yu-Gang Jiang VLM AAML 38 3 0 28 Oct 2024
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks Erfan Shayegani Md Abdullah Al Mamun Yu Fu Pedram Zaree Yue Dong Nael B. Abu-Ghazaleh AAML 135 139 0 16 Oct 2023
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Timo Schick Sahana Udupa Hinrich Schütze 251 374 0 28 Feb 2021