MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

24 April 2024

Jin Wang

Ping Luo

Papers citing "MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI"

50 / 69 papers shown

Jina-VLM: Small Multilingual Vision Language Model

359

03 Dec 2025

Multimodal Reinforcement Learning with Agentic Verifier for AI Agents

...

192

03 Dec 2025

AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs

223

26 Nov 2025

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

312

25 Nov 2025

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Yiming Qin

Bomin Wei

Jiaxin Ge

Konstantinos Kallidromitis

260

24 Nov 2025

NVIDIA Nemotron Nano V2 VL

Nvidia

Amala Sanjay Deshmukh

...

313

06 Nov 2025

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

...

351

28 Oct 2025

PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection

512

27 Oct 2025

VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data

Tingqiao Xu

Ziru Zeng

Jiayu Chen

17 Oct 2025

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

186

14 Oct 2025

Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement

107

09 Oct 2025

AstroMMBench: A Benchmark for Evaluating Multimodal Large Language Models Capabilities in Astronomy

194

29 Sep 2025

OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment

Teng Xiao

Zuchao Li

Lefei Zhang

184

23 Sep 2025

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

212

19 Sep 2025

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models

...

19 Sep 2025

A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation

140

18 Sep 2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

...

198

16 Sep 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

...

305

279

25 Aug 2025

HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes

...

182

19 Aug 2025

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

...

340

07 Aug 2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

...

241

06 Aug 2025

Evaluating Variance in Visual Question Answering Benchmarks

Nikitha SR

LRM

160

04 Aug 2025

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

327

30 Jul 2025

MOVE: Motion-Guided Few-Shot Video Object Segmentation

244

29 Jul 2025

MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models

...

193

20 Jun 2025

Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling

111

13 Jun 2025

Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?Annual Meeting of the Association for Computational Linguistics (ACL), 2025

298

12 Jun 2025

CoMemo: LVLMs Need Image Context with Image Memory

218

06 Jun 2025

MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos

211

04 Jun 2025

Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation

209

02 Jun 2025

Affordance Benchmark for MLLMs

247

01 Jun 2025

Improve MLLM Benchmark Efficiency through Interview

225

01 Jun 2025

AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs

262

27 May 2025

Bias and Generalizability of Foundation Models across Datasets in Breast MammographyInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

339

14 May 2025

SITE: towards Spatial Intelligence Thorough Evaluation

293

08 May 2025

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

512

19 Apr 2025

Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models

...

471

16 Apr 2025

Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models

Teppei Suzuki

Keisuke Ozawa

VLM

483

14 Apr 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

...

619

806

14 Apr 2025

MM-IFEngine: Towards Multimodal Instruction Following

520

10 Apr 2025

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

418

19 Mar 2025

Aligning Multimodal LLM with Human Preference: A Survey

...

833

18 Mar 2025

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesComputer Vision and Pattern Recognition (CVPR), 2025

288

11 Mar 2025

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

...

948

09 Mar 2025

VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering

214

09 Mar 2025

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual ContextsComputer Vision and Pattern Recognition (CVPR), 2025

592

28 Feb 2025

From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education

359

20 Feb 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

...

453

13 Feb 2025

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

369

19 Jan 2025

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

464

17 Dec 2024