ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.22187
26
0

Active Learning for Vision-Language Models

29 October 2024
Bardia Safaei
Vishal M. Patel
    VLM
ArXivPDFHTML
Abstract

Pre-trained vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot performance on a wide range of downstream computer vision tasks. However, there still exists a considerable performance gap between these models and a supervised deep model trained on a downstream dataset. To bridge this gap, we propose a novel active learning (AL) framework that enhances the zero-shot classification performance of VLMs by selecting only a few informative samples from the unlabeled data for annotation during training. To achieve this, our approach first calibrates the predicted entropy of VLMs and then utilizes a combination of self-uncertainty and neighbor-aware uncertainty to calculate a reliable uncertainty measure for active sample selection. Our extensive experiments show that the proposed approach outperforms existing AL approaches on several image classification datasets, and significantly enhances the zero-shot performance of VLMs.

View on arXiv
Comments on this paper