ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06281
  4. Cited By
MMBench: Is Your Multi-modal Model an All-around Player?
v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 686 papers shown
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
156
0
0
16 Oct 2025
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjDLRMVLM
369
1
0
16 Oct 2025
Train a Unified Multimodal Data Quality Classifier with Synthetic Data
Train a Unified Multimodal Data Quality Classifier with Synthetic Data
Weizhi Wang
Rongmei Lin
Shiyang Li
Colin Lockard
Ritesh Sarkhel
Sanket Lokegaonkar
Jingbo Shang
Xifeng Yan
Nasser Zalmout
Xian Li
99
0
0
16 Oct 2025
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Xingrui Wang
Jiang Liu
Chao Huang
X. Yu
Ze Wang
Ximeng Sun
Jialian Wu
Alan Yuille
Emad Barsoum
Zicheng Liu
VLM
101
0
0
16 Oct 2025
Vision-Centric Activation and Coordination for Multimodal Large Language Models
Vision-Centric Activation and Coordination for Multimodal Large Language Models
Yunnan Wang
Fan Lu
Kecheng Zheng
Ziyuan Huang
Ziqiang Li
Wenjun Zeng
Xin Jin
MLLM
359
0
0
16 Oct 2025
VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models
VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models
Dominick Reilly
Manish Kumar Govind
Le Xue
Srijan Das
VLM
151
0
0
15 Oct 2025
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Run Luo
Xiaobo Xia
Lu Wang
Longze Chen
Renke Shan
Jing Luo
Min Yang
Tat-Seng Chua
VGen
241
4
0
15 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
428
4
0
15 Oct 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Zhenxin Lei
Zhangwei Gao
Changyao Tian
Erfei Cui
Guanzhou Chen
...
Xiangyu Zhao
Jiayi Ji
Yu Qiao
Wenhai Wang
Gen Luo
VLM
248
0
0
14 Oct 2025
VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage
VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage
A. Alfarano
L. Venturoli
D. Negueruela del Castillo
CoGeVLM
201
0
0
14 Oct 2025
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
Weiyang Jin
Yuwei Niu
Jiaqi Liao
Chengqi Duan
Aoxue Li
Shenghua Gao
Xihui Liu
LRM
208
4
0
14 Oct 2025
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Long Cui
Weiyun Wang
Jie Shao
Zichen Wen
Gen Luo
Linfeng Zhang
Y. Zhang
Yu Qiao
Wenhai Wang
VLM
186
2
0
14 Oct 2025
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
Kartik Narayan
Yang Xu
Tian Cao
Kavya Nerella
Vishal M. Patel
Navid Shiee
Peter Grasch
Chao Jia
Yinfei Yang
Zhe Gan
ObjDKELMVLM
261
5
0
14 Oct 2025
Prompt-Guided Spatial Understanding with RGB-D Transformers for Fine-Grained Object Relation Reasoning
Prompt-Guided Spatial Understanding with RGB-D Transformers for Fine-Grained Object Relation Reasoning
Tanner Muturi
Blessing Agyei Kyem
Joshua Kofi Asamoah
Neema Jakisa Owor
Richard Dyzinela
Andrews Danyo
Y. Adu-Gyamfi
Armstrong Aboah
LRM
122
3
0
13 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
250
5
0
13 Oct 2025
FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models
FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models
Shengming Yuan
Xinyu Lyu
Shuailong Wang
Beitao Chen
Jingkuan Song
Lianli Gao
LRM
268
0
0
13 Oct 2025
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization
Zihan Wang
Zhiyong Ma
Zhongkui Ma
Shuofeng Liu
Akide Liu
Derui Wang
Minhui Xue
Guangdong Bai
AAML
131
1
0
13 Oct 2025
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
L. Yang
Huiyu Duan
Ran Tao
Juntao Cheng
Sijing Wu
Yunhao Li
Jing Liu
Xiongkuo Min
Guangtao Zhai
VLM
116
0
0
13 Oct 2025
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Caorui Li
Yu Chen
Yiyan Ji
Jin Xu
Zhenyu Cui
...
Zili Wang
Minghao Liu
Junran Peng
Zhaoxiang Zhang
Jiaheng Liu
AuLLMLRM
155
8
0
12 Oct 2025
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue
H. Zhang
Xiangyu Zeng
Boyu Chen
Chenting Wang
...
Lu Dong
Kunpeng Du
Yi Wang
Limin Wang
Yali Wang
183
7
0
12 Oct 2025
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
Yichen Yan
Ming Zhong
Qi Zhu
Xiaoling Gu
Jinpeng Chen
Huan Li
119
0
0
11 Oct 2025
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Yifan Li
Z. Chen
Z. F. Wu
Kun Zhou
Ruipu Luo
Can Zhang
Z. He
Yufei Zhan
Wayne Xin Zhao
Minghui Qiu
LRMVLM
146
1
0
10 Oct 2025
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
Junyan Ye
Dongzhi Jiang
Jun-Jian He
Baichuan Zhou
Zilong Huang
Zhiyuan Yan
Jiaming Song
Conghui He
Weijia Li
ReLMVLMLRM
121
2
0
10 Oct 2025
MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
Ming Dai
Sen Yang
Boqiang Duan
Wankou Yang
Jingdong Wang
VOS
281
0
0
10 Oct 2025
LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition
LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition
Yushuo Zheng
Zicheng Zhang
Xiongkuo Min
Huiyu Duan
Guangtao Zhai
117
1
0
10 Oct 2025
UniVideo: Unified Understanding, Generation, and Editing for Videos
UniVideo: Unified Understanding, Generation, and Editing for Videos
Cong Wei
Quande Liu
Zixuan Ye
Qiulin Wang
Xintao Wang
Pengfei Wan
Kun Gai
Wenhu Chen
VGen
262
14
0
09 Oct 2025
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
Mansi Sakarvadia
Kareem Hegazy
A. Totounferoush
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
SupR
275
25
0
08 Oct 2025
AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation
AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation
Adam Hung
Fan Yang
Abhinav Kumar
Sergio Aguilera Marinovic
Soshi Iba
Rana Soltani Zarrin
Dmitry Berenson
123
3
0
08 Oct 2025
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh Damodaran
Paul D. Rowe
AAML
132
9
0
07 Oct 2025
The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning
The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning
Mayank Ravishankara
Varindra V. Persad Maharaj
ELM
202
1
0
05 Oct 2025
AgriGPT-VL: Agricultural Vision-Language Understanding Suite
AgriGPT-VL: Agricultural Vision-Language Understanding Suite
Bo Yang
Yunkui Chen
Lanfei Feng
Y. Zhang
Xiao-Qiang Xu
...
Nueraili Aierken
Runhe Huang
Hongjian Lin
Yibin Ying
Shijian Li
VLM
311
4
0
05 Oct 2025
What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models
What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models
Zicong He
Boxuan Zhang
Weihao Liu
Ruixiang Tang
Lu Cheng
ELM
138
1
0
05 Oct 2025
Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention
Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention
Xin Zou
Di Lu
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Xu Zheng
Linfeng Zhang
Xuming Hu
VLM
281
6
0
03 Oct 2025
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
Krishna Teja Chitty-Venkata
M. Emani
MLLMVGenLRMVLM
190
1
0
02 Oct 2025
Growing Visual Generative Capacity for Pre-Trained MLLMs
Growing Visual Generative Capacity for Pre-Trained MLLMs
Hanyu Wang
Jiaming Han
Ziyan Yang
Qi Zhao
Shanchuan Lin
Xiangyu Yue
Abhinav Shrivastava
Zhenheng Yang
Hao Chen
VLM
201
0
0
02 Oct 2025
RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation
RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation
Hang Wu
Yujun Cai
Haonan Ge
H. Chen
Ming-Hsuan Yang
Yiwei Wang
CoGe
175
0
0
02 Oct 2025
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen
Shaobo Wang
Yufa Zhou
J. Zhang
Qintong Zhang
...
Zhaorun Chen
Bin Wang
W. Li
Conghui He
Linfeng Zhang
VLM
176
8
0
01 Oct 2025
Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs
Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs
Leyla Mirvakhabova
B. Bejnordi
Gaurav Kumar
Hanxue Liang
Wanru Zhao
Paul N. Whatmough
MoE
96
0
0
01 Oct 2025
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
Nilay Naharas
Dang Nguyen
Nesihan Bulut
M. Bateni
Vahab Mirrokni
Baharan Mirzasoleiman
104
0
0
01 Oct 2025
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
Peng Liu
H. Shen
Chunxin Fang
Zhicheng Sun
Jiajia Liao
T. Zhao
MLLMObjDVLMLRM
215
2
0
30 Sep 2025
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Junlin Han
Shengbang Tong
David Fan
Yufan Ren
Koustuv Sinha
Juil Sock
Filippos Kokkinos
LRMVLM
201
7
0
30 Sep 2025
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Yuansen Liu
Haiming Tang
Jinlong Peng
Jiangning Zhang
Xiaozhong Ji
...
Chaoyou Fu
Chengjie Wang
Chengjie Wang
Xiaobin Hu
Shuicheng Yan
VLM
241
1
0
30 Sep 2025
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
Yang Chen
Minghao Liu
Yufan Shen
Y. Li
Tianyuan Huang
...
Zhi Yu
Yongliang Shen
Yu Qiao
Yu Qiao
Ding Wang
VGenVLM
258
0
0
29 Sep 2025
Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
Yuanshuai Li
Yuping Yan
Junfeng Tang
Yunxuan Li
Zeqi Zheng
Yaochu Jin
132
0
0
29 Sep 2025
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
Chenyue Zhou
Mingxuan Wang
Yanbiao Ma
Chenxu Wu
Wanyi Chen
...
Guoli Jia
Lingling Li
Z. Lu
Y. Lu
Wenhan Luo
LRM
448
9
0
29 Sep 2025
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Jitai Hao
Hao Liu
Xinyan Xiao
Qiang Huang
Jun Yu
223
0
0
29 Sep 2025
HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score
HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score
Jingqi Xu
Jingxi Lu
Chenghao Li
Sreetama Sarkar
Peter A. Beerel
166
1
0
28 Sep 2025
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
Lin Long
Changdae Oh
Seongheon Park
Yixuan Li
VLMMLLM
187
1
1
27 Sep 2025
GaussianVision: Vision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting
GaussianVision: Vision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting
Yasmine Omri
Connor Ding
Tsachy Weissman
Thierry Tambe
3DGSVLM
291
0
0
26 Sep 2025
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
Zhengyan Wan
Yidong Ouyang
Liyan Xie
Fang Fang
Hongyuan Zha
Guang Cheng
159
0
0
26 Sep 2025
Previous
123456...121314
Next