ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05772
56
0

Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails

9 February 2025
Yijun Yang
L. Wang
Xiao Yang
Lanqing Hong
Jun Zhu
    AAML
ArXivPDFHTML
Abstract

Vision Large Language Models (VLLMs) integrate visual data processing, expanding their real-world applications, but also increasing the risk of generating unsafe responses. In response, leading companies have implemented Multi-Layered safety defenses, including alignment training, safety system prompts, and content moderation. However, their effectiveness against sophisticated adversarial attacks remains largely unexplored. In this paper, we propose MultiFaceted Attack, a novel attack framework designed to systematically bypass Multi-Layered Defenses in VLLMs. It comprises three complementary attack facets: Visual Attack that exploits the multimodal nature of VLLMs to inject toxic system prompts through images; Alignment Breaking Attack that manipulates the model's alignment mechanism to prioritize the generation of contrasting responses; and Adversarial Signature that deceives content moderators by strategically placing misleading information at the end of the response. Extensive evaluations on eight commercial VLLMs in a black-box setting demonstrate that MultiFaceted Attack achieves a 61.56% attack success rate, surpassing state-of-the-art methods by at least 42.18%.

View on arXiv
@article{yang2025_2502.05772,
  title={ Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails },
  author={ Yijun Yang and Lichao Wang and Xiao Yang and Lanqing Hong and Jun Zhu },
  journal={arXiv preprint arXiv:2502.05772},
  year={ 2025 }
}
Comments on this paper