v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023

12 July 2023

Conghui He

Ziwei Liu

Kai-xiang Chen

Dahua Lin

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 685 papers shown

ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models

103

26 Sep 2025

Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching

159

26 Sep 2025

Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

127

26 Sep 2025

OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment

Teng Xiao

Zuchao Li

Lefei Zhang

178

23 Sep 2025

How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

...

319

23 Sep 2025

Rule Encoding and Compliance in Large Language Models: An Information-Theoretic Analysis

Joachim Diederich

204

23 Sep 2025

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models

126

23 Sep 2025

BaseReward: A Strong Baseline for Multimodal Reward Model

...

128

19 Sep 2025

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

211

19 Sep 2025

Pyramid Token Pruning for High-Resolution Large Vision-Language Models via Region, Token, and Instruction-Guided Importance

152

19 Sep 2025

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models

...

19 Sep 2025

Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration

102

17 Sep 2025

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models

168

17 Sep 2025

SAIL-VL2 Technical Report

...

293

17 Sep 2025

HERO: Rethinking Visual Token Early Dropping in High-Resolution Large Vision-Language Models

185

16 Sep 2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

...

197

16 Sep 2025

AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models

...

333

16 Sep 2025

The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations

122

16 Sep 2025

MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment

116

15 Sep 2025

MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs

318

15 Sep 2025

Seeing is Not Understanding: A Benchmark on Perception-Cognition Disparities in Large Language Models

155

14 Sep 2025

The Telephone Game: Evaluating Semantic Drift in Unified Models

167

04 Sep 2025

Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection

259

03 Sep 2025

OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation

364

03 Sep 2025

VLMs-in-the-Wild: Bridging the Gap Between Academic Benchmarks and Enterprise Reality

Srihari Bandraupalli

Anupam Purwar

VLM

03 Sep 2025

Understanding Space Is Rocket Science -- Only Top Reasoning Models Can Solve Spatial Understanding Tasks

190

02 Sep 2025

Implicit Reasoning in Large Language Models: A Comprehensive Survey

213

02 Sep 2025

Variation-aware Vision Token Dropping for Faster Large Vision-Language Models

01 Sep 2025

Kwai Keye-VL 1.5 Technical Report

...

326

01 Sep 2025

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

168

01 Sep 2025

Improving Large Vision and Language Models by Learning from a Panel of Peers

129

01 Sep 2025

Reinforced Visual Perception with Tools

155

01 Sep 2025

TrimTokenator: Towards Adaptive Visual Token Pruning for Large Multimodal Models

181

30 Aug 2025

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

187

28 Aug 2025

Improving Alignment in LVLMs with Debiased Self-Judgment

209

28 Aug 2025

SUMMA: A Multimodal Large Language Model for Advertisement Summarization

130

28 Aug 2025

KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts

151

27 Aug 2025

PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality

26 Aug 2025

MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

111

25 Aug 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

...

294

265

25 Aug 2025

VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference

116

25 Aug 2025

Language-Specific Layer Matters: Efficient Multilingual Enhancement for Large Vision-Language Models

25 Aug 2025

AVAM: Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering

135

25 Aug 2025

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance

118

25 Aug 2025

Explain Before You Answer: A Survey on Compositional Visual Reasoning

...

355

24 Aug 2025

Towards Open World Detection: A Survey

Andrei-Stefan Bulzan

Cosmin Cernazanu-Glavan

ObjD VLM

215

22 Aug 2025

Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation

...

158

21 Aug 2025

Directed-Tokens: A Robust Multi-Modality Alignment Approach to Large Language-Vision Models

292

19 Aug 2025

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

...

259

18 Aug 2025

RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts

17 Aug 2025