v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023

12 July 2023

Conghui He

Ziwei Liu

Kai-xiang Chen

Dahua Lin

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 687 papers shown

Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model

400

13 Feb 2025

Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Mohammad Mahdi Abootorabi

Amirhosein Zobeiri

Mahdi Dehghani

Mohammadali Mohammadkhani

724

12 Feb 2025

HAMSTER: Hierarchical Action Models For Open-World Robot ManipulationInternational Conference on Learning Representations (ICLR), 2025

...

756

08 Feb 2025

PixelWorld: How Far Are We from Perceiving Everything as Pixels?

Zhiheng Lyu

Xueguang Ma

Wenhu Chen

675

31 Jan 2025

Benchmarking Gaslighting Negation Attacks Against Multimodal Large Language Models

1.2K

31 Jan 2025

Baichuan-Omni-1.5 Technical Report

Tao Zhang

...

330

28 Jan 2025

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

600

21 Jan 2025

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

372

19 Jan 2025

Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces

460

17 Jan 2025

LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models

250

13 Jan 2025

OneLLM: One Framework to Align All Modalities with LanguageComputer Vision and Pattern Recognition (CVPR), 2023

577

198

10 Jan 2025

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenInternational Conference on Learning Representations (ICLR), 2025

456

106

07 Jan 2025

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

...

612

07 Jan 2025

FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance

981

05 Jan 2025

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024

...

868

121

03 Jan 2025

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

769

02 Jan 2025

Diving into Self-Evolving Training for Multimodal Reasoning

435

23 Dec 2024

CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

241

22 Dec 2024

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language EmbeddingComputer Vision and Pattern Recognition (CVPR), 2024

...

514

20 Dec 2024

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall SpacesComputer Vision and Pattern Recognition (CVPR), 2024

528

349

18 Dec 2024

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

464

17 Dec 2024

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token FoldingComputer Vision and Pattern Recognition (CVPR), 2024

...

355

12 Dec 2024

Olympus: A Universal Task Router for Computer Vision TasksComputer Vision and Pattern Recognition (CVPR), 2024

1.2K

12 Dec 2024

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

419

05 Dec 2024

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMsComputer Vision and Pattern Recognition (CVPR), 2024

481

04 Dec 2024

AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?

673

04 Dec 2024

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

449

04 Dec 2024

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?International Conference on Learning Representations (ICLR), 2024

401

02 Dec 2024

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024

396

02 Dec 2024

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal ReasoningComputer Vision and Pattern Recognition (CVPR), 2024

...

607

27 Nov 2024

Evaluating Vision-Language Models as Evaluators in Path PlanningComputer Vision and Pattern Recognition (CVPR), 2024

671

27 Nov 2024

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

565

27 Nov 2024

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for RoboticsComputer Vision and Pattern Recognition (CVPR), 2024

888

25 Nov 2024

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 LanguagesComputer Vision and Pattern Recognition (CVPR), 2024

...

813

25 Nov 2024

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-ExpertsComputer Vision and Pattern Recognition (CVPR), 2024

392

23 Nov 2024

FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression

317

21 Nov 2024

From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning

444

19 Nov 2024

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

...

Wentao Zhang

661

18 Nov 2024

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?Computer Vision and Pattern Recognition (CVPR), 2024

...

426

17 Nov 2024

Multimodal Instruction Tuning with Hybrid State Space Models

267

13 Nov 2024

Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM

1.1K

06 Nov 2024

Classification Done Right for Vision-Language Pre-TrainingNeural Information Processing Systems (NeurIPS), 2024

421

05 Nov 2024

Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

...

359

05 Nov 2024

LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

613

01 Nov 2024

ProMQA: Question Answering Dataset for Multimodal Procedural Activity UnderstandingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Kimihiro Hasegawa

Wiradee Imrattanatrai

260

29 Oct 2024

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

Xiangqi Wang

Mohamed Elhoseiny

Xiangliang Zhang

337

28 Oct 2024

EfficientEQA: An Efficient Approach to Open-Vocabulary Embodied Question Answering

188

26 Oct 2024

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)International Conference on Learning Representations (ICLR), 2024

435

25 Oct 2024

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

589

25 Oct 2024

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

...

Yufeng Cui

Xinlong Wang

Yaoqi Liu

Fangxiang Feng

Guang Liu

SyDa VLM MLLM

448

24 Oct 2024