ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19672
37
0

Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

27 February 2025
Chenhe Gu
Jindong Gu
Andong Hua
Yao Qin
    AAML
ArXivPDFHTML
Abstract

Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding. However, while MLLMs are vulnerable to adversarial attacks, the transferability of these attacks across different models remains limited, especially under targeted attack setting. Existing methods primarily focus on vision-specific perturbations but struggle with the complex nature of vision-language modality alignment. In this work, we introduce the Dynamic Vision-Language Alignment (DynVLA) Attack, a novel approach that injects dynamic perturbations into the vision-language connector to enhance generalization across diverse vision-language alignment of different models. Our experimental results show that DynVLA significantly improves the transferability of adversarial examples across various MLLMs, including BLIP2, InstructBLIP, MiniGPT4, LLaVA, and closed-source models such as Gemini.

View on arXiv
@article{gu2025_2502.19672,
  title={ Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack },
  author={ Chenhe Gu and Jindong Gu and Andong Hua and Yao Qin },
  journal={arXiv preprint arXiv:2502.19672},
  year={ 2025 }
}
Comments on this paper