Adversarial Prompt Distillation for Vision-Language Models

22 November 2024

Yu-Gang Jiang

Abstract

Large pre-trained Vision-Language Models (VLMs) such as Contrastive Language-Image Pre-training (CLIP) have been shown to be susceptible to adversarial attacks, raising concerns about their deployment in safety-critical applications like autonomous driving and medical diagnosis. One promising approach for robustifying pre-trained VLMs is Adversarial Prompt Tuning (APT), which applies adversarial training during the process of prompt tuning. However, existing APT methods are mostly single-modal methods that design prompt(s) for only the visual or textual modality, limiting their effectiveness in either robustness or clean accuracy. In this work, we propose Adversarial Prompt Distillation (APD), a bimodal knowledge distillation framework that enhances APT by integrating it with multi-modal knowledge transfer. APD optimizes prompts for both visual and textual modalities while distilling knowledge from a clean pre-trained teacher CLIP model. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our APD method over the current state-of-the-art APT methods in terms of both adversarial robustness and clean accuracy. The effectiveness of APD also validates the possibility of using a non-robust teacher to improve the generalization and robustness of fine-tuned VLMs.

View on arXiv

@article{luo2025_2411.15244,
  title={ Adversarial Prompt Distillation for Vision-Language Models },
  author={ Lin Luo and Xin Wang and Bojia Zi and Shihao Zhao and Xingjun Ma and Yu-Gang Jiang },
  journal={arXiv preprint arXiv:2411.15244},
  year={ 2025 }
}

Comments on this paper