v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023

12 July 2023

Conghui He

Ziwei Liu

Kai-xiang Chen

Dahua Lin

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 687 papers shown

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact

249

18 Jun 2025

Show-o2: Improved Native Unified Multimodal Models

476

18 Jun 2025

SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks

379

17 Jun 2025

Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence

268

16 Jun 2025

Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model

271

16 Jun 2025

Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling

111

13 Jun 2025

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

400

12 Jun 2025

Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

...

361

12 Jun 2025

Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025

...

293

11 Jun 2025

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

210

10 Jun 2025

Synthetic Visual GenomeComputer Vision and Pattern Recognition (CVPR), 2025

...

212

09 Jun 2025

SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards

236

09 Jun 2025

WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-CodeAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

206

09 Jun 2025

SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning

181

08 Jun 2025

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

277

08 Jun 2025

CoMemo: LVLMs Need Image Context with Image Memory

218

06 Jun 2025

ExAct: A Video-Language Benchmark for Expert Action Analysis

Oluwatumininu Oguntola

Gedas Bertasius

202

06 Jun 2025

MokA: Multimodal Low-Rank Adaptation for MLLMs

274

05 Jun 2025

Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations

234

05 Jun 2025

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

474

05 Jun 2025

Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction

186

05 Jun 2025

MiMo-VL Technical Report

...

258

04 Jun 2025

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

332

03 Jun 2025

PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

232

03 Jun 2025

Learning Sparsity for Effective and Efficient Music Performance Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

244

02 Jun 2025

MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping

284

02 Jun 2025

K12Vista: Exploring the Boundaries of MLLMs in K-12 Education

189

02 Jun 2025

NavBench: Probing Multimodal Large Language Models for Embodied Navigation

250

01 Jun 2025

Improve MLLM Benchmark Efficiency through Interview

225

01 Jun 2025

GuessBench: Sensemaking Multimodal Creativity in the Wild

311

01 Jun 2025

Affordance Benchmark for MLLMs

247

01 Jun 2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

...

203

30 May 2025

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

276

30 May 2025

SORCE: Small Object Retrieval in Complex Environments

148

30 May 2025

When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways

222

30 May 2025

Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts

179

30 May 2025

Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap

Wenhan Yang

Spencer Stice

Ali Payani

Baharan Mirzasoleiman

MLLM

222

30 May 2025

Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck

Yuwen Tan

Yuan Qing

Boqing Gong

280

30 May 2025

Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information

326

29 May 2025

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

...

334

29 May 2025

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation

182

29 May 2025

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models

215

28 May 2025

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Yi Ding

Ruqi Zhang

ReLM LRM VLM

256

28 May 2025

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models

312

28 May 2025

Zero-Shot Vision Encoder Grafting via LLM Surrogates

235

28 May 2025

Spatial Knowledge Graph-Guided Multimodal SynthesisIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

347

28 May 2025

AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs

262

27 May 2025

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

380

27 May 2025

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models

385

27 May 2025

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

394

26 May 2025