v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023

12 July 2023

Conghui He

Ziwei Liu

Kai-xiang Chen

Dahua Lin

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 687 papers shown

VideoAds for Fast-Paced Video Understanding

302

12 Apr 2025

Data Metabolism: An Efficient Data Design Schema For Vision Language Model

389

10 Apr 2025

MM-IFEngine: Towards Multimodal Instruction Following

524

10 Apr 2025

Kimi-VL Technical Report

...

994

144

10 Apr 2025

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

384

10 Apr 2025

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

599

10 Apr 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

...

319

10 Apr 2025

V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models

363

08 Apr 2025

On the Suitability of Reinforcement Fine-Tuning to Visual Tasks

352

08 Apr 2025

LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts

288

07 Apr 2025

VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning

342

03 Apr 2025

Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

584

01 Apr 2025

AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference

382

31 Mar 2025

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO

314

31 Mar 2025

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

438

29 Mar 2025

Unicorn: Text-Only Data Synthesis for Vision Language Model Training

247

28 Mar 2025

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-AnalysisComputer Vision and Pattern Recognition (CVPR), 2025

393

28 Mar 2025

Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models

307

28 Mar 2025

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

987

27 Mar 2025

InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression

421

27 Mar 2025

On Large Multimodal Models as Open-World Image Classifiers

464

27 Mar 2025

Beyond Intermediate States: Explaining Visual Redundancy through Language

251

26 Mar 2025

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment

317

26 Mar 2025

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

635

26 Mar 2025

Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy

306

26 Mar 2025

Dynamic Pyramid Network for Efficient Multimodal Large Language Model

369

26 Mar 2025

LangBridge: Interpreting Image as a Combination of Language Embeddings

...

361

25 Mar 2025

RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models

386

25 Mar 2025

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?

447

25 Mar 2025

Scaling Vision Pre-Training to 4K ResolutionComputer Vision and Pattern Recognition (CVPR), 2025

...

906

25 Mar 2025

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

487

24 Mar 2025

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language ModelComputer Vision and Pattern Recognition (CVPR), 2025

...

479

24 Mar 2025

Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding

561

24 Mar 2025

Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

384

23 Mar 2025

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

414

21 Mar 2025

A Vision Centric Remote Sensing Benchmark

Abduljaleel Adejumo

Faegheh Yeganli

Clifford Broni-bediako

Aoran Xiao

Xiangwei Zhu

Mennatullah Siam

403

20 Mar 2025

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

423

19 Mar 2025

VisNumBench: Evaluating Number Sense of Multimodal Large Language Models

290

19 Mar 2025

DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies

401

18 Mar 2025

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

...

343

18 Mar 2025

Growing a Twig to Accelerate Large Vision-Language Models

368

18 Mar 2025

Aligning Multimodal LLM with Human Preference: A Survey

...

838

18 Mar 2025

Can Large Vision Language Models Read Maps Like a Human?

391

18 Mar 2025

Squeeze Out Tokens from Sample for Finer-Grained Data Governance

...

290

18 Mar 2025

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models

257

17 Mar 2025

ClusComp: A Simple Paradigm for Model Compression and Efficient FinetuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

375

17 Mar 2025

Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference

356

17 Mar 2025

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data CalibrationComputer Vision and Pattern Recognition (CVPR), 2025

553

17 Mar 2025

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

400

17 Mar 2025

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling

439

17 Mar 2025