ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05457
27
0

Taxonomy-Aware Evaluation of Vision-Language Models

7 April 2025
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge J. Belongie
Ryan Cotterell
Nico Lang
Stella Frank
ArXivPDFHTML
Abstract

When a vision-language model (VLM) is prompted to identify an entity depicted in an image, it may answer Í see a conifer,' rather than the specific label ñorway spruce'. This raises two issues for evaluation: First, the unconstrained generated text needs to be mapped to the evaluation label space (i.e., 'conifer'). Second, a useful classification measure should give partial credit to less-specific, but not incorrect, answers (ñorway spruce' being a type of 'conifer'). To meet these requirements, we propose a framework for evaluating unconstrained text predictions, such as those generated from a vision-language model, against a taxonomy. Specifically, we propose the use of hierarchical precision and recall measures to assess the level of correctness and specificity of predictions with regard to a taxonomy. Experimentally, we first show that existing text similarity measures do not capture taxonomic similarity well. We then develop and compare different methods to map textual VLM predictions onto a taxonomy. This allows us to compute hierarchical similarity measures between the generated text and the ground truth labels. Finally, we analyze modern VLMs on fine-grained visual classification tasks based on our proposed taxonomic evaluation scheme.

View on arXiv
@article{snæbjarnarson2025_2504.05457,
  title={ Taxonomy-Aware Evaluation of Vision-Language Models },
  author={ Vésteinn Snæbjarnarson and Kevin Du and Niklas Stoehr and Serge Belongie and Ryan Cotterell and Nico Lang and Stella Frank },
  journal={arXiv preprint arXiv:2504.05457},
  year={ 2025 }
}
Comments on this paper