ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.10872
38
0

TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

13 March 2025
Xiangyu Yin
Yi Qi
Jinwei Hu
Zhen Chen
Yi Dong
Xingyu Zhao
Xiaowei Huang
Wenjie Ruan
ArXivPDFHTML
Abstract

Vision Language Models (VLMs) have demonstrated impressive inference capabilities, but remain vulnerable to jailbreak attacks that can induce harmful or unethical responses. Existing defence methods are predominantly white-box approaches that require access to model parameters and extensive modifications, making them costly and impractical for many real-world scenarios. Although some black-box defences have been proposed, they often impose input constraints or require multiple queries, limiting their effectiveness in safety-critical tasks such as autonomous driving. To address these challenges, we propose a novel black-box defence framework called \textbf{T}extual \textbf{A}nchoring for \textbf{I}mmunizing \textbf{J}ailbreak \textbf{I}mages (\textbf{TAIJI}). TAIJI leverages key phrase-based textual anchoring to enhance the model's ability to assess and mitigate the harmful content embedded within both visual and textual prompts. Unlike existing methods, TAIJI operates effectively with a single query during inference, while preserving the VLM's performance on benign tasks. Extensive experiments demonstrate that TAIJI significantly enhances the safety and reliability of VLMs, providing a practical and efficient solution for real-world deployment.

View on arXiv
@article{yin2025_2503.10872,
  title={ TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models },
  author={ Xiangyu Yin and Yi Qi and Jinwei Hu and Zhen Chen and Yi Dong and Xingyu Zhao and Xiaowei Huang and Wenjie Ruan },
  journal={arXiv preprint arXiv:2503.10872},
  year={ 2025 }
}
Comments on this paper