MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models

Though CLIP-based prompt tuning significantly enhances pre-trained Vision-Language Models, existing research focuses on reconstructing the model architecture, e.g., additional loss calculation and meta-networks. These approaches generally lead to increased complexity and extended training cost. To maintain the efficiency of the tuning process, we propose plug-and-play Model-Agnostic Optimization (MAO) for prompt tuning. Without altering any components of the prompt tuning backbone, we introduce a Data-Driven Enhancement framework to optimize the distribution of the initial data, and incorporate an Alterable Regularization module to boost the task-specific feature processing pipeline, thereby improving overall performance while maintaining low computational cost. Extensive experiments on MAO demonstrate its outstanding performance and efficiency. The code of MAO is available at:this https URL.
View on arXiv@article{li2025_2503.18160, title={ MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models }, author={ Haoyang Li and Siyu Zhou and Liang Wang and Guodong Long }, journal={arXiv preprint arXiv:2503.18160}, year={ 2025 } }