ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.05429
20
0

Coordinated Robustness Evaluation Framework for Vision-Language Models

5 June 2025
Ashwin Ramesh Babu
Sajad Mousavi
Vineet Gundecha
Sahand Ghorbanpour
Avisek Naug
Antonio Guillen
Ricardo Luna Gutierrez
Soumyendu Sarkar
    AAML
ArXiv (abs)PDFHTML
Main:7 Pages
3 Figures
Bibliography:2 Pages
2 Tables
Abstract

Vision-language models, which integrate computer vision and natural language processing capabilities, have demonstrated significant advancements in tasks such as image captioning and visual question and answering. However, similar to traditional models, they are susceptible to small perturbations, posing a challenge to their robustness, particularly in deployment scenarios. Evaluating the robustness of these models requires perturbations in both the vision and language modalities to learn their inter-modal dependencies. In this work, we train a generic surrogate model that can take both image and text as input and generate joint representation which is further used to generate adversarial perturbations for both the text and image modalities. This coordinated attack strategy is evaluated on the visual question and answering and visual reasoning datasets using various state-of-the-art vision-language models. Our results indicate that the proposed strategy outperforms other multi-modal attacks and single-modality attacks from the recent literature. Our results demonstrate their effectiveness in compromising the robustness of several state-of-the-art pre-trained multi-modal models such as instruct-BLIP, ViLT and others.

View on arXiv
@article{babu2025_2506.05429,
  title={ Coordinated Robustness Evaluation Framework for Vision-Language Models },
  author={ Ashwin Ramesh Babu and Sajad Mousavi and Vineet Gundecha and Sahand Ghorbanpour and Avisek Naug and Antonio Guillen and Ricardo Luna Gutierrez and Soumyendu Sarkar },
  journal={arXiv preprint arXiv:2506.05429},
  year={ 2025 }
}
Comments on this paper