ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.16176
17
0

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

28 August 2024
M. Maruf
Arka Daw
Kazi Sajeed Mehrab
Harish Babu Manogaran
Abhilash Neog
Medha Sawhney
Mridul Khurana
James P. Balhoff
Yasin Bakiş
Bahadir Altintas
Matthew J. Thompson
Elizabeth G. Campolongo
Josef C. Uyeda
Hilmar Lapp
Henry L. Bart
Paula M. Mabee
Yu-Chuan Su
Wei-Lun Chao
Charles V. Stewart
T. Berger-Wolf
Wasila Dahdul
Anuj Karpatne
    CoGe
ArXivPDFHTML
Abstract

Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images. The code and datasets for running all the analyses reported in this paper can be found at https://github.com/sammarfy/VLM4Bio.

View on arXiv
Comments on this paper