v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023

12 July 2023

Conghui He

Ziwei Liu

Kai-xiang Chen

Dahua Lin

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 687 papers shown

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

317

24 Oct 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

...

Yuhang Cao

Jiaqi Wang

337

136

22 Oct 2024

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesInternational Conference on Learning Representations (ICLR), 2024

429

22 Oct 2024

OpenMU: Your Swiss Army Knife for Music Understanding

385

21 Oct 2024

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

440

21 Oct 2024

Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models

321

18 Oct 2024

ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs

373

18 Oct 2024

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial SamplesNeural Information Processing Systems (NeurIPS), 2024

659

18 Oct 2024

FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion

291

16 Oct 2024

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs

Yunqiu Xu

Linchao Zhu

Yi Yang

423

16 Oct 2024

MEV Capture Through Time-Advantaged Arbitrage

257

14 Oct 2024

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Zhaoyang Wang

...

Huaxiu Yao

348

14 Oct 2024

Can We Predict Performance of Large Models across Vision-Language Tasks?

498

14 Oct 2024

3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications

Eduardo R. Corral-Soto

462

14 Oct 2024

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language BootstrappingInternational Conference on Learning Representations (ICLR), 2024

442

11 Oct 2024

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingComputer Vision and Pattern Recognition (CVPR), 2024

Zhaokai Wang

Yu Qiao

Xizhou Zhu

VLM MLLM

383

10 Oct 2024

Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision

Shengcao Cao

Liang-Yan Gui

Yu Wang

249

10 Oct 2024

Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

952

10 Oct 2024

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal ModelsInternational Conference on Learning Representations (ICLR), 2024

Pan Lu

Kai-Wei Chang

Nanyun Peng

VLM

370

10 Oct 2024

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference TimeInternational Conference on Learning Representations (ICLR), 2024

317

09 Oct 2024

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical AlignmentInternational Conference on Learning Representations (ICLR), 2024

Xiangyuan Lan

304

08 Oct 2024

ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt

287

08 Oct 2024

MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models

Kun Wang

Xuming Hu

267

07 Oct 2024

ActiView: Evaluating Active Perception Ability for Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Ziyue Wang

Chi Chen

Yuanchi Zhang

Xiaolong Wang

Yang Liu

326

07 Oct 2024

MM-R$^3$: On (In-)Consistency of Vision-Language Models (VLMs)

MM-R

^3

: On (In-)Consistency of Vision-Language Models (VLMs)

293

07 Oct 2024

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

Himanshu Gupta

Shreyas Verma

Ujjwala Anantheswaran

Swaroop Mishra

263

06 Oct 2024

Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs

Javier Marin

LRM

534

196

06 Oct 2024

Gamified crowd-sourcing of high-quality data for visual fine-tuning

305

05 Oct 2024

Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

594

05 Oct 2024

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

...

471

04 Oct 2024

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual ModalitiesInternational Conference on Learning Representations (ICLR), 2024

360

03 Oct 2024

EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Sara Ghazanfari

Alexandre Araujo

Prashanth Krishnamurthy

Siddharth Garg

Farshad Khorrami

VLM

313

02 Oct 2024

ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art

248

02 Oct 2024

Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMsComputer Vision and Pattern Recognition (CVPR), 2024

Zicheng Zhang

Ziheng Jia

H. Wu

Chunyi Li

Zijian Chen

...

Wei Sun

Xiaohong Liu

Xiongkuo Min

Weisi Lin

Guangtao Zhai

368

30 Sep 2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024

Kai Chen

Zhili Liu

...

Jun Yao

447

26 Sep 2024

DARE: Diverse Visual Question Answering with Robustness EvaluationTransactions of the Association for Computational Linguistics (TACL), 2024

348

26 Sep 2024

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

586

25 Sep 2024

OmniBench: Towards The Future of Universal Omni-Language Models

...

612

23 Sep 2024

AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual GranularityAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

316

20 Sep 2024

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated ImagesNeural Information Processing Systems (NeurIPS), 2024

Zhecan Wang

Junzhang Liu

Chia-Wei Tang

Hani Alomari

Anushka Sivakumar

...

Haoxuan You

A. Ishmam

Kai-Wei Chang

Shih-Fu Chang

Chris Thomas

CoGe VLM

531

19 Sep 2024

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary ResolutionInternational Conference on Learning Representations (ICLR), 2024

Zuyan Liu

Yuhao Dong

Ziwei Liu

Winston Hu

Jiwen Lu

Yongming Rao

ObjD

614

134

19 Sep 2024

Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive DecodingChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2024

Yuchen Yang

Jian Chen

150

10 Sep 2024

COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesAAAI Conference on Artificial Intelligence (AAAI), 2024

223

06 Sep 2024

An overview of domain-specific foundation model: key technologies, applications and challengesScience China Information Sciences (Sci. China Inf. Sci.), 2024

498

06 Sep 2024

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban ScenariosAAAI Conference on Artificial Intelligence (AAAI), 2024

Junyan Ye

Jinhua Yu

Songyang Zhang

Dahua Lin

Conghui He

Weijia Li

VLM

343

30 Aug 2024

Law of Vision Representation in MLLMs

593

29 Aug 2024

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Subhashree Radhakrishnan

...

408

116

28 Aug 2024

IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal CapabilitiesAAAI Conference on Artificial Intelligence (AAAI), 2024

486

23 Aug 2024

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?International Conference on Learning Representations (ICLR), 2024

Yi-Fan Zhang

Huanyu Zhang

Haochen Tian

Chaoyou Fu

Shuangqing Zhang

...

Qingsong Wen

Zhang Zhang

Liwen Wang

Rong Jin

Tieniu Tan

OffRL

374

138

23 Aug 2024

ParGo: Bridging Vision-Language with Partial and Global ViewsAAAI Conference on Artificial Intelligence (AAAI), 2024

526

23 Aug 2024