51
0

Learning Generalizable Prompt for CLIP with Class Similarity Knowledge

Abstract

In vision-language models (VLMs), prompt tuning has shown its effectiveness in adapting models to downstream tasks. However, learned prompts struggle to generalize to unseen classes, as they tend to overfit to the classes that are targeted during prompt tuning. Examining failure cases, we observed that learned prompts disrupt the semantics of unseen classes, generating text embeddings with incorrect semantic relationships among classes. To address this, we propose Similarity Alignment Regularization (SAR), which regularizes learnable prompts to preserve the semantic relationships among classes captured by hand-crafted prompts. Specifically, we first obtain novel classes related to base classes using ChatGPT-4o and utilize them as potential unseen classes during prompt tuning. Then, by targeting both base and novel classes, SAR aligns the similarity relationships among text embeddings generated by learnable prompts with the similarity relationships from hand-crafted prompts. Extensive experiments applying SAR to existing prompt tuning methods demonstrate its effectiveness in improving generalization to unseen classes.

View on arXiv
@article{jung2025_2502.11969,
  title={ Learning Generalizable Prompt for CLIP with Class Similarity Knowledge },
  author={ Sehun Jung and Hyang-won Lee },
  journal={arXiv preprint arXiv:2502.11969},
  year={ 2025 }
}
Comments on this paper