ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06281
  4. Cited By
MMBench: Is Your Multi-modal Model an All-around Player?
v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 687 papers shown
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
Hyunji Lee
Seunghyun Yoon
Yunjae Won
Hanseok Oh
Geewook Kim
Trung H. Bui
Franck Dernoncourt
Elias Stengel-Eskin
Mohit Bansal
Minjoon Seo
LRM
249
2
0
18 Jun 2025
Show-o2: Improved Native Unified Multimodal Models
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie
Zhenheng Yang
Mike Zheng Shou
VGen
476
90
0
18 Jun 2025
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks
Zijian Song
Xiaoxin Lin
Qiuming Huang
Guangrun Wang
Liang Lin
LRM
379
5
0
17 Jun 2025
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Jianlong Wu
Sihao Liu
Chuan Rao
Bang An
Tiancheng Shen
Juil Sock
Ming-Hsuan Yang
Bernard Ghanem
268
4
0
16 Jun 2025
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Shaolei Zhang
Shoutao Guo
Qingkai Fang
Yan Zhou
Yang Feng
MLLMAuLLMVLM
271
9
0
16 Jun 2025
Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling
Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling
Shengwu.Xiong
Tianyu.Zou
Cong.Wang
Xuelong Li
111
0
0
13 Jun 2025
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang
Mengzhen Liu
Lichen Li
Ming Lu
Yuan Zhang
Junwen Pan
Qi She
Shanghang Zhang
VLM
400
18
0
12 Jun 2025
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Zhiyang Xu
Jiuhai Chen
Zhaojiang Lin
Xichen Pan
Lifu Huang
...
Di Jin
Michihiro Yasunaga
Lili Yu
Xi Lin
Shaoliang Nie
361
4
0
12 Jun 2025
Vision Generalist Model: A Survey
Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
293
0
0
11 Jun 2025
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
Dianyi Wang
Wei Song
Yikun Wang
Siyuan Wang
Kaicheng Yu
Zhongyu Wei
Jiaqi Wang
210
3
0
10 Jun 2025
Synthetic Visual Genome
Synthetic Visual GenomeComputer Vision and Pattern Recognition (CVPR), 2025
J. S. Park
Zixian Ma
Linjie Li
Chenhao Zheng
Cheng-Yu Hsieh
...
Quan Kong
Norimasa Kobori
Ali Farhadi
Yejin Choi
Ranjay Krishna
212
0
0
09 Jun 2025
SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards
SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards
Jixiang Hong
Yiran Zhang
Guanzhong Wang
Yi Liu
Ji-Rong Wen
Rui Yan
LRM
236
1
0
09 Jun 2025
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-CodeAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhiyu Lin
Zhengda Zhou
Zhiyuan Zhao
Tianrui Wan
Yilun Ma
Junyu Gao
Xuelong Li
ELM
206
6
0
09 Jun 2025
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
Mengya Xu
Zhongzhen Huang
Dillan Imans
Yiru Ye
Xiaofan Zhang
Qi Dou
181
1
0
08 Jun 2025
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Tianyi Bai
Yuxuan Fan
Jiantao Qiu
Fupeng Sun
Jiayi Song
Junlin Han
Zichen Liu
Conghui He
Wentao Zhang
Binhang Yuan
MLLMVLM
277
2
0
08 Jun 2025
CoMemo: LVLMs Need Image Context with Image Memory
CoMemo: LVLMs Need Image Context with Image Memory
Shi-Qi Liu
Weijie Su
Xizhou Zhu
Wenhai Wang
Jifeng Dai
VLM
218
0
0
06 Jun 2025
ExAct: A Video-Language Benchmark for Expert Action Analysis
ExAct: A Video-Language Benchmark for Expert Action Analysis
Han Yi
Yulu Pan
Feihong He
Xinyu Liu
Benjamin Zhang
Oluwatumininu Oguntola
Gedas Bertasius
202
1
0
06 Jun 2025
MokA: Multimodal Low-Rank Adaptation for MLLMs
MokA: Multimodal Low-Rank Adaptation for MLLMs
Yake Wei
Yu Miao
Dongzhan Zhou
Di Hu
274
0
0
05 Jun 2025
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
Linjie Li
Mahtab Bigverdi
Jiawei Gu
Zixian Ma
Yinuo Yang
Ziang Li
Yejin Choi
Ranjay Krishna
LRM
234
8
0
05 Jun 2025
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Jiahui Wang
Z. Liu
Yongming Rao
Jiwen Lu
VLMLRM
474
3
0
05 Jun 2025
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Zhihao Tang
Chaozhuo Li
Litian Zhang
Xi Zhang
DiffMMedIm
186
14
0
05 Jun 2025
MiMo-VL Technical Report
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRLMoEVLMLRM
258
16
0
04 Jun 2025
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Mengdi Jia
Zekun Qi
Shaochen Zhang
Wenyao Zhang
Xinqiang Yu
Jiawei He
He Wang
L. Yi
LRMVLM
332
28
0
03 Jun 2025
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Jenny Schmalfuss
Nadine Chang
Vibashan VS
Maying Shen
Andrés Bruhn
Jose M. Alvarez
VLM
232
0
0
03 Jun 2025
Learning Sparsity for Effective and Efficient Music Performance Question Answering
Learning Sparsity for Effective and Efficient Music Performance Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xingjian Diao
Tianzhen Yang
Chunhui Zhang
Weiyi Wu
Ming Cheng
Jiang Gui
244
6
0
02 Jun 2025
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Xiaojun Shan
Qi Cao
Xing Han
Haofei Yu
Paul Liang
284
1
0
02 Jun 2025
K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
Chong Li
C. Zhu
Tao Zhang
Mingan Lin
Zenan Zhou
Jian Xie
LRM
189
1
0
02 Jun 2025
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
Yanyuan Qiao
Haodong Hong
Wenqi Lyu
Dong An
Siqi Zhang
Yutong Xie
Xinyu Wang
Qi Wu
LM&Ro
250
5
0
01 Jun 2025
Improve MLLM Benchmark Efficiency through Interview
Improve MLLM Benchmark Efficiency through Interview
Farong Wen
Yijin Guo
Junying Wang
Jiaohao Xiao
Yingjie Zhou
Chunyi Li
Qi Jia
Guangtao Zhai
Zicheng Zhang
MLLM
225
2
0
01 Jun 2025
GuessBench: Sensemaking Multimodal Creativity in the Wild
GuessBench: Sensemaking Multimodal Creativity in the Wild
Zifeng Zhu
Shangbin Feng
Herun Wan
Ningnan Wang
Minnan Luo
Yulia Tsvetkov
MLLMCoGeVLM
311
1
0
01 Jun 2025
Affordance Benchmark for MLLMs
Affordance Benchmark for MLLMs
Junying Wang
Wenzhe Li
Yalun Wu
Yingji Liang
Yijin Guo
Chunyi Li
Haodong Duan
Zicheng Zhang
Guangtao Zhai
247
4
0
01 Jun 2025
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Gen Luo
Ganlin Yang
Ziyang Gong
Guanzhou Chen
Haonan Duan
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Rongrong Ji
X. Zhu
LM&Ro
203
19
0
30 May 2025
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Yaxin Luo
Zhaoyi Li
Jiacheng Liu
Jiacheng Cui
Xiaohan Zhao
Zhiqiang Shen
LLMAGLRMVLM
276
7
0
30 May 2025
SORCE: Small Object Retrieval in Complex Environments
SORCE: Small Object Retrieval in Complex Environments
Chunxu Liu
Chi Xie
X. Chen
Wei Li
Feng Zhu
Rui Zhao
Limin Wang
148
0
0
30 May 2025
When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways
When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways
Kailin Jiang
Yuntao Du
Yukai Ding
Yuchen Ren
Ning Jiang
Zhi Gao
Zilong Zheng
Lei Liu
Bin Li
Qing Li
KELM
222
2
0
30 May 2025
Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts
Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts
Xin He
Xumeng Han
Longhui Wei
Lingxi Xie
Qi Tian
MoE
179
2
0
30 May 2025
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
Wenhan Yang
Spencer Stice
Ali Payani
Baharan Mirzasoleiman
MLLM
222
1
0
30 May 2025
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Yuwen Tan
Yuan Qing
Boqing Gong
280
6
0
30 May 2025
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
Xu Chu
Xinrong Chen
Guanyu Wang
Zhijie Tan
Kui Huang
Wenyu Lv
Tong Mo
Weiping Li
LRMVLM
326
6
0
29 May 2025
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Qingyu Shi
Jinbin Bai
Zhuoran Zhao
Wenhao Chai
Kaidong Yu
...
Shuangyong Song
Yunhai Tong
Xiangtai Li
X. Li
Shuicheng Yan
334
23
0
29 May 2025
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
Jihai Zhang
Tianle Li
Linjie Li
Zhengyuan Yang
Yu Cheng
182
6
0
29 May 2025
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models
Linglin Jing
Yuting Gao
Zhigang Wang
Wang Lan
Yiwen Tang
Wenhai Wang
Kaipeng Zhang
Qingpei Guo
MoE
215
1
0
28 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLMLRMVLM
256
6
0
28 May 2025
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
Ce Zhang
Kaixin Ma
Tianqing Fang
Wenhao Yu
Hongming Zhang
Zhisong Zhang
Yaqi Xie
Katia Sycara
Haitao Mi
Dong Yu
VLM
312
7
0
28 May 2025
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue
Vasu Singla
Menglin Jia
John Kirchenbauer
Rifaa Qadri
Zikui Cai
A. Bhatele
Furong Huang
Tom Goldstein
VLM
235
0
0
28 May 2025
Spatial Knowledge Graph-Guided Multimodal Synthesis
Spatial Knowledge Graph-Guided Multimodal SynthesisIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Yida Xue
Zhen Bi
Jinnan Yang
Jungang Lou
Ningyu Zhang
M. Zhang
Huajun Chen
Ningyu Zhang
347
0
0
28 May 2025
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
262
0
0
27 May 2025
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Yu Zhang
Jinlong Ma
Yongshuai Hou
Xuefeng Bai
Kehai Chen
Yang Xiang
Jun Yu
Min Zhang
380
7
0
27 May 2025
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Peter Robicheaux
Matvei Popov
Anish Madan
Isaac Robinson
Joseph Nelson
Deva Ramanan
Neehar Peri
ObjDVLM
385
16
0
27 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
394
19
0
26 May 2025
Previous
123...567...121314
Next