ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06281
  4. Cited By
MMBench: Is Your Multi-modal Model an All-around Player?
v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 687 papers shown
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Wei He
Zhiheng Xi
Wanxu Zhao
Xiaoran Fan
Yiwen Ding
Zifei Shan
Tao Gui
Tao Gui
Xuanjing Huang
LRM
317
22
0
24 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Bin Wang
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
337
136
0
22 Oct 2024
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesInternational Conference on Learning Representations (ICLR), 2024
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
429
40
0
22 Oct 2024
OpenMU: Your Swiss Army Knife for Music Understanding
OpenMU: Your Swiss Army Knife for Music Understanding
Mengjie Zhao
Zhi-Wei Zhong
Zhuoyuan Mao
Shiqi Yang
Wei-Hsiang Liao
Shusuke Takahashi
Hiromi Wakaki
Yuki Mitsufuji
OSLM
385
10
0
21 Oct 2024
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Y. Cai
Jiangning Zhang
Haoyang He
Xinwei He
Ao Tong
Zhenye Gan
Chengjie Wang
Zhucun Xue
Yong-Jin Liu
X. Bai
VLM
440
24
0
21 Oct 2024
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
Olga Loginova
Oleksandr Bezrukov
Ravi Shekhar
Alexey Kravets
321
4
0
18 Oct 2024
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
Tiancheng Gu
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
MLLMVLM
373
1
0
18 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial SamplesNeural Information Processing Systems (NeurIPS), 2024
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAMLCoGeVLM
659
62
0
18 Oct 2024
FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with
  Image Insertion
FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion
Jiacheng Ruan
Yebin Yang
Peng Liu
Feiyu Xiong
Zeyun Tang
Zhiyu Li
Zhiyu Li
VLM
291
3
0
16 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
423
21
0
16 Oct 2024
MEV Capture Through Time-Advantaged Arbitrage
MEV Capture Through Time-Advantaged Arbitrage
Robin Fritsch
Maria Ines Silva
A. Mamageishvili
Benjamin Livshits
E. Felten
257
13
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
348
29
0
14 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
498
1
0
14 Oct 2024
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
462
11
0
14 Oct 2024
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language BootstrappingInternational Conference on Learning Representations (ICLR), 2024
Yue Yang
Shanghang Zhang
Wenqi Shao
Kaipeng Zhang
Yi Bin
Yu Wang
Ping Luo
442
15
0
11 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingComputer Vision and Pattern Recognition (CVPR), 2024
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLMMLLM
383
68
0
10 Oct 2024
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
Shengcao Cao
Liang-Yan Gui
Yu Wang
249
5
0
10 Oct 2024
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Xiaoyuan Liu
Wenxuan Wang
Youliang Yuan
Shu Yang
Qiuzhi Liu
Pinjia He
Zhaopeng Tu
952
2
0
10 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal ModelsInternational Conference on Learning Representations (ICLR), 2024
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
370
29
0
10 Oct 2024
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference TimeInternational Conference on Learning Representations (ICLR), 2024
Yi Ding
Bolian Li
Ruqi Zhang
MLLM
317
42
0
09 Oct 2024
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical
  Alignment
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical AlignmentInternational Conference on Learning Representations (ICLR), 2024
Yifei Xing
Xiangyuan Lan
Ruiping Wang
Shihong Deng
Wenjun Huang
Qingfang Zheng
Yaowei Wang
Mamba
304
2
0
08 Oct 2024
ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
Fanhu Zeng
Fei Zhu
Haiyang Guo
Xu-Yao Zhang
Cheng-Lin Liu
VLMCLL
287
15
0
08 Oct 2024
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in
  Multimodal Large Language Models
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Kaichen Huang
Jiahao Huo
Yibo Yan
Kun Wang
Yutao Yue
Xuming Hu
267
2
0
07 Oct 2024
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
ActiView: Evaluating Active Perception Ability for Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Ziyue Wang
Chi Chen
Ziyue Wang
Yurui Dong
Yuanchi Zhang
Yuzhuang Xu
Xiaolong Wang
Ziwei Sun
Yang Liu
LRM
326
8
0
07 Oct 2024
MM-R$^3$: On (In-)Consistency of Vision-Language Models (VLMs)
MM-R3^33: On (In-)Consistency of Vision-Language Models (VLMs)
Shih-Han Chou
Shivam Chandhok
James J. Little
Leonid Sigal
293
0
0
07 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLMLRM
263
19
0
06 Oct 2024
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Javier Marin
LRM
534
196
0
06 Oct 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
305
1
0
05 Oct 2024
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiayi He
Hehai Lin
Q. Wang
Yi R. Fung
Chenhui Xu
ReLMLRM
594
27
0
05 Oct 2024
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Kening Zheng
...
Junkai Chen
Peijie Jiang
Qingbin Liu
Chang Tang
Xuming Hu
471
30
0
04 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual ModalitiesInternational Conference on Learning Representations (ICLR), 2024
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
360
11
0
03 Oct 2024
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Sara Ghazanfari
Alexandre Araujo
Prashanth Krishnamurthy
Siddharth Garg
Farshad Khorrami
VLM
313
7
0
02 Oct 2024
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
Qi Jia
Xiang Yue
Shanshan Huang
Ziheng Qin
Yizhu Liu
Bill Yuchen Lin
Yang You
Guangtao Zhai
VLM
248
2
0
02 Oct 2024
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMsComputer Vision and Pattern Recognition (CVPR), 2024
Zicheng Zhang
Ziheng Jia
H. Wu
Chunyi Li
Zijian Chen
...
Wei Sun
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
368
29
0
30 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
447
44
0
26 Sep 2024
DARE: Diverse Visual Question Answering with Robustness Evaluation
DARE: Diverse Visual Question Answering with Robustness EvaluationTransactions of the Association for Computational Linguistics (TACL), 2024
Hannah Sterz
Jonas Pfeiffer
Ivan Vulić
OODVLM
348
4
0
26 Sep 2024
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Na Zhao
Zhiyu Tan
Hao Li
Yue Yu
MLLM
586
47
0
25 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Y. Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
612
51
0
23 Sep 2024
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual GranularityAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhibin Lan
Liqiang Niu
Fandong Meng
Wenbo Li
Jie Zhou
Jinsong Su
VLM
316
4
0
20 Sep 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated ImagesNeural Information Processing Systems (NeurIPS), 2024
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGeVLM
531
5
0
19 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary ResolutionInternational Conference on Learning Representations (ICLR), 2024
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
614
134
0
19 Sep 2024
Mitigating Hallucination in Visual-Language Models via Re-Balancing
  Contrastive Decoding
Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive DecodingChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2024
Xiaoyu Liang
Jiayuan Yu
Lianrui Mu
Jiedong Zhuang
Jiaqi Hu
Yuchen Yang
Jiangnan Ye
Lu Lu
Jian Chen
Haoji Hu
VLM
150
7
0
10 Sep 2024
COLUMBUS: Evaluating COgnitive Lateral Understanding through
  Multiple-choice reBUSes
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesAAAI Conference on Artificial Intelligence (AAAI), 2024
Koen Kraaijveld
Yifan Jiang
Kaixin Ma
Filip Ilievski
LRM
223
7
0
06 Sep 2024
An overview of domain-specific foundation model: key technologies, applications and challenges
An overview of domain-specific foundation model: key technologies, applications and challengesScience China Information Sciences (Sci. China Inf. Sci.), 2024
Haolong Chen
Hanzhi Chen
Zijian Zhao
Kaifeng Han
Guangxu Zhu
Yichen Zhao
Ying Du
Wei Xu
Qingjiang Shi
ALMVLM
498
21
0
06 Sep 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban ScenariosAAAI Conference on Artificial Intelligence (AAAI), 2024
Baichuan Zhou
Haote Yang
Dairong Chen
Junyan Ye
Tianyi Bai
Jinhua Yu
Songyang Zhang
Dahua Lin
Conghui He
Weijia Li
VLM
343
26
0
30 Aug 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
593
16
0
29 Aug 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
408
116
0
28 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal CapabilitiesAAAI Conference on Artificial Intelligence (AAAI), 2024
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
486
6
0
23 Aug 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?International Conference on Learning Representations (ICLR), 2024
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
Liwen Wang
Rong Jin
Tieniu Tan
OffRL
374
138
0
23 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
ParGo: Bridging Vision-Language with Partial and Global ViewsAAAI Conference on Artificial Intelligence (AAAI), 2024
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLMVLM
526
25
0
23 Aug 2024
Previous
123...1011121314
Next
Page 11 of 14
Pageof 14