v1v2 (latest)

Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy

International Conference on Learning Representations (ICLR), 2024

11 February 2024

Thomas Brox

Papers citing "Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy"

7 / 7 papers shown

Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance

228

25 Sep 2025

Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025

Ranjan Sapkota

Manoj Karkee

ObjD VLM

287

25 Aug 2025

VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

258

29 May 2025

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

...

1.1K

05 May 2025

Directional Gradient Projection for Robust Fine-Tuning of Foundation ModelsInternational Conference on Learning Representations (ICLR), 2025

Chengyue Huang

Junjiao Tian

Brisa Maneechotesuwan

Shivang Chopra

Z. Kira

503

21 Feb 2025

Patent Figure Classification using Large Vision-language ModelsEuropean Conference on Information Retrieval (ECIR), 2025

Sushil Awale

Eric Müller-Budack

Ralph Ewerth

203

22 Jan 2025

VILA-M3: Enhancing Vision-Language Models with Medical Expert KnowledgeComputer Vision and Pattern Recognition (CVPR), 2024

...

Baris Turkbey

Holger Roth

Daguang Xu

VLM

535

19 Nov 2024