ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.21699
52
0

MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX

27 March 2025
Liuyue Xie
George Z. Wei
Avik Kuthiala
Ce Zheng
Ananya Bal
Mosam Dabhi
Liting Wen
Taru Rustagi
Ethan Lai
Sushil Khyalia
Rohan Choudhury
Morteza Ziyadi
Xu Zhang
Hao Yang
László A. Jeni
ArXivPDFHTML
Abstract

Frontier models have either been language-only or have primarily focused on vision and language modalities. Although recent advancements in models with vision and audio understanding capabilities have shown substantial progress, the field lacks a standardized evaluation framework for thoroughly assessing their cross-modality perception performance. We introduce MAVERIX~(Multimodal Audio-Visual Evaluation Reasoning IndeX), a novel benchmark with 700 videos and 2,556 questions explicitly designed to evaluate multimodal models through tasks that necessitate close integration of video and audio information. MAVERIX uniquely provides models with audiovisual tasks, closely mimicking the multimodal perceptual experiences available to humans during inference and decision-making processes. To our knowledge, MAVERIX is the first benchmark aimed explicitly at assessing comprehensive audiovisual integration. Experiments with state-of-the-art models, including Gemini 1.5 Pro and o1, show performance approaching human levels (around 70% accuracy), while human experts reach near-ceiling performance (95.1%). With standardized evaluation protocols, a rigorously annotated pipeline, and a public toolkit, MAVERIX establishes a challenging testbed for advancing audiovisual multimodal intelligence.

View on arXiv
@article{xie2025_2503.21699,
  title={ MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX },
  author={ Liuyue Xie and George Z. Wei and Avik Kuthiala and Ce Zheng and Ananya Bal and Mosam Dabhi and Liting Wen and Taru Rustagi and Ethan Lai and Sushil Khyalia and Rohan Choudhury and Morteza Ziyadi and Xu Zhang and Hao Yang and László A. Jeni },
  journal={arXiv preprint arXiv:2503.21699},
  year={ 2025 }
}
Comments on this paper