v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023

12 July 2023

Conghui He

Ziwei Liu

Kai-xiang Chen

Dahua Lin

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 686 papers shown

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

156

16 Oct 2025

CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection

369

16 Oct 2025

Train a Unified Multimodal Data Quality Classifier with Synthetic Data

16 Oct 2025

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

101

16 Oct 2025

Vision-Centric Activation and Coordination for Multimodal Large Language Models

359

16 Oct 2025

VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models

151

15 Oct 2025

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

241

15 Oct 2025

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

...

428

15 Oct 2025

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites

...

248

14 Oct 2025

VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage

A. Alfarano

L. Venturoli

D. Negueruela del Castillo

CoGe VLM

201

14 Oct 2025

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

208

14 Oct 2025

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

186

14 Oct 2025

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

261

14 Oct 2025

Prompt-Guided Spatial Understanding with RGB-D Transformers for Fine-Grained Object Relation Reasoning

122

13 Oct 2025

A Survey on Agentic Multimodal Large Language Models

...

LM&Ro AIFin AI4TS LRM AI4CE

250

13 Oct 2025

FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models

268

13 Oct 2025

Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

131

13 Oct 2025

ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?

116

13 Oct 2025

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

...

155

12 Oct 2025

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation

...

183

12 Oct 2025

CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization

119

11 Oct 2025

Unleashing Perception-Time Scaling to Multimodal Reasoning Models

146

10 Oct 2025

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

121

10 Oct 2025

MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding

281

10 Oct 2025

LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition

117

10 Oct 2025

UniVideo: Unified Understanding, Generation, and Editing for Videos

262

09 Oct 2025

The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators

275

08 Oct 2025

AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation

Adam Hung

Fan Yang

Abhinav Kumar

Sergio Aguilera Marinovic

Soshi Iba

Rana Soltani Zarrin

Dmitry Berenson

123

08 Oct 2025

Automated Repeatable Adversary Threat Emulation with Effects Language (EL)

Suresh Damodaran

Paul D. Rowe

AAML

132

07 Oct 2025

The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning

Mayank Ravishankara

Varindra V. Persad Maharaj

ELM

202

05 Oct 2025

AgriGPT-VL: Agricultural Vision-Language Understanding Suite

...

311

05 Oct 2025

What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

138

05 Oct 2025

Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention

281

03 Oct 2025

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models

Krishna Teja Chitty-Venkata

M. Emani

MLLM VGen LRM VLM

190

02 Oct 2025

Growing Visual Generative Capacity for Pre-Trained MLLMs

201

02 Oct 2025

RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation

175

02 Oct 2025

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

...

176

01 Oct 2025

Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs

01 Oct 2025

Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories

Baharan Mirzasoleiman

104

01 Oct 2025

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

215

30 Sep 2025

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training

201

30 Sep 2025

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models

...

241

30 Sep 2025

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

...

258

29 Sep 2025

Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs

132

29 Sep 2025

From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models

...

448

29 Sep 2025

Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models

223

29 Sep 2025

HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score

166

28 Sep 2025

Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding

187

27 Sep 2025

GaussianVision: Vision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting

291

26 Sep 2025

Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching

159

26 Sep 2025