MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models

23 March 2025

Abstract

Though CLIP-based prompt tuning significantly enhances pre-trained Vision-Language Models, existing research focuses on reconstructing the model architecture, e.g., additional loss calculation and meta-networks. These approaches generally lead to increased complexity and extended training cost. To maintain the efficiency of the tuning process, we propose plug-and-play Model-Agnostic Optimization (MAO) for prompt tuning. Without altering any components of the prompt tuning backbone, we introduce a Data-Driven Enhancement framework to optimize the distribution of the initial data, and incorporate an Alterable Regularization module to boost the task-specific feature processing pipeline, thereby improving overall performance while maintaining low computational cost. Extensive experiments on MAO demonstrate its outstanding performance and efficiency. The code of MAO is available at:this https URL.

View on arXiv

@article{li2025_2503.18160,
  title={ MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models },
  author={ Haoyang Li and Siyu Zhou and Liang Wang and Guodong Long },
  journal={arXiv preprint arXiv:2503.18160},
  year={ 2025 }
}

Comments on this paper