BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger

BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger

17 August 2024

Haoran Li

Zihao Zheng

Yangqiu Song

Bryan Hooi

Papers citing "BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger"

7 / 7 papers shown

Title
Distraction is All You Need for Multimodal Large Language Model Jailbreaking Zuopeng Yang Jiluan Fan Anli Yan Erdun Gao Xin Lin Tao Li Kanghua mo Changyu Dong AAML 70 0 0 15 Feb 2025
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey Xuannan Liu Xing Cui Peipei Li Zekun Li Huaibo Huang Shuhan Xia Miaoxuan Zhang Yueying Zou Ran He AAML 51 4 0 14 Nov 2024
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios Yunkai Dang Mengxi Gao Yibo Yan Xin Zou Yanggan Gu Aiwei Liu Xuming Hu 34 4 0 05 Nov 2024
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts Yichen Gong Delong Ran Jinyuan Liu Conglei Wang Tianshuo Cong Anyu Wang Sisi Duan Xiaoyun Wang MLLM 127 116 0 09 Nov 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions Haoran Li Yulin Chen Jinglong Luo Yan Kang Xiaojin Zhang Qi Hu Chunkit Chan Yangqiu Song PILM 19 39 0 16 Oct 2023
On the Adversarial Robustness of Multi-Modal Foundation Models Christian Schlarmann Matthias Hein AAML 90 45 0 21 Aug 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022