LLaVA-Critic: Learning to Evaluate Multimodal Models

3 October 2024

Tianyi Xiong

Dong Guo

Heng Huang

Chunyuan Li

Abstract

We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and (2) Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs.

View on arXiv

@article{xiong2025_2410.02712,
  title={ LLaVA-Critic: Learning to Evaluate Multimodal Models },
  author={ Tianyi Xiong and Xiyao Wang and Dong Guo and Qinghao Ye and Haoqi Fan and Quanquan Gu and Heng Huang and Chunyuan Li },
  journal={arXiv preprint arXiv:2410.02712},
  year={ 2025 }
}

Comments on this paper