ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.20808
34
3

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

28 February 2025
P. Wang
Zhongzhi Li
Fei Yin
Dekang Ran
Chenglin Liu
Cheng-Lin Liu
    LRM
ArXivPDFHTML
Abstract

Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encountered in real-world mathematical applications. To address this gap, we introduce MV-MATH: a meticulously curated dataset of 2,009 high-quality mathematical problems. Each problem integrates multiple images interleaved with text, derived from authentic K-12 scenarios, and enriched with detailed annotations. MV-MATH includes multiple-choice, free-form, and multi-step questions, covering 11 subject areas across 3 difficulty levels, and serves as a comprehensive and rigorous benchmark for assessing MLLMs' mathematical reasoning in multi-visual contexts. Through extensive experimentation, we observe that MLLMs encounter substantial challenges in multi-visual math tasks, with a considerable performance gap relative to human capabilities on MV-MATH. Furthermore, we analyze the performance and error patterns of various models, providing insights into MLLMs' mathematical reasoning capabilities within multi-visual settings.

View on arXiv
@article{wang2025_2502.20808,
  title={ MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts },
  author={ Peijie Wang and Zhong-Zhi Li and Fei Yin and Xin Yang and Dekang Ran and Cheng-Lin Liu },
  journal={arXiv preprint arXiv:2502.20808},
  year={ 2025 }
}
Comments on this paper