ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06281
  4. Cited By
MMBench: Is Your Multi-modal Model an All-around Player?
v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 654 papers shown
Title
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
Xiang Wang
Zhifei Zhang
Chentao Song
Zhe Lin
Yuqian Zhou
...
Haitian Zheng
Jason Kuen
Yuehuan Wang
Changxin Gao
Nong Sang
MoE
69
0
0
25 Nov 2025
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
Wengyi Zhan
Mingbao Lin
Zhihang Lin
Rongrong Ji
MLLMVLMLRM
152
0
0
24 Nov 2025
Robot-Powered Data Flywheels: Deploying Robots in the Wild for Continual Data Collection and Foundation Model Adaptation
Robot-Powered Data Flywheels: Deploying Robots in the Wild for Continual Data Collection and Foundation Model Adaptation
J. Grannen
Michelle Pan
Kenneth Llontop
Cherie Ho
Mark Zolotas
Jeannette Bohg
Dorsa Sadigh
LM&Ro
202
0
0
24 Nov 2025
ConsistCompose: Unified Multimodal Layout Control for Image Composition
ConsistCompose: Unified Multimodal Layout Control for Image Composition
Xuanke Shi
B. Li
Xiaoyang Han
Zhongang Cai
Lei Yang
Dahua Lin
Quan-ding Wang
MLLM
206
0
0
23 Nov 2025
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
Yuting Gao
Wang Lan
Hengyuan Zhao
Linjiang Huang
Si Liu
Q. Guo
MoE
96
0
0
23 Nov 2025
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
Jun Zhang
Jie Feng
Long Chen
Junhui Wang
Zhicheng Liu
Depeng Jin
Yong Li
LRM
48
0
0
22 Nov 2025
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
Guoyang Xia
Yifeng Ding
Fengfa Li
Lei Ren
Wei Chen
Fangxiang Feng
Xiaojie Wang
MoEVLM
72
0
0
22 Nov 2025
Learning to Think Fast and Slow for Visual Language Models
Chenyu Lin
Cheng Chi
Jinlin Wu
Sharon Li
Kaiyang Zhou
ReLMVLM
161
0
0
20 Nov 2025
VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference
VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference
Ziyan Liu
Y. Chen
Hongyi Cai
Tao Lin
Shuo Yang
Zheng Liu
Bo Zhao
VLM
215
0
0
20 Nov 2025
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Yushi Huang
Z. Wang
Zhihang Yuan
Yifu Ding
Ruihao Gong
Jinyang Guo
Xianglong Liu
Jun Zhang
MoEVLM
156
0
0
19 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
186
1
0
19 Nov 2025
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
Duo Li
Zuhao Yang
Xiaoqin Zhang
Ling Shao
Shijian Lu
VLM
113
1
0
19 Nov 2025
First Frame Is the Place to Go for Video Content Customization
First Frame Is the Place to Go for Video Content Customization
Jingxi Chen
Z. Li
Zhichao Liu
Guangyao Shi
Xiyang Wu
Fuxiao Liu
Cornelia Fermüller
Brandon Yushan Feng
Yiannis Aloimonos
DiffMVGen
141
0
0
19 Nov 2025
When to Think and When to Look: Uncertainty-Guided Lookback
When to Think and When to Look: Uncertainty-Guided Lookback
Jing Bi
Filippos Bellos
Junjia Guo
Yayuan Li
Chao Huang
...
Tang
Luchuan Song
Susan Liang
Zhongfei
Zhang
LRM
197
0
0
19 Nov 2025
FlexiCup: Wireless Multimodal Suction Cup with Dual-Zone Vision-Tactile Sensing
FlexiCup: Wireless Multimodal Suction Cup with Dual-Zone Vision-Tactile Sensing
Junhao Gong
Shoujie Li
Kit-Wa Sou
Changqing Guo
Hourong Huang
...
Yifan Xie
Chenxin Liang
Chuqiao Lyu
Xiaojun Liang
Wenbo Ding
100
1
0
18 Nov 2025
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
N Dinesh Reddy
Dylan Snyder
Lona Kiragu
Mirajul Mohin
Shahrear Bin Amin
Sudeep Pillai
28
0
0
18 Nov 2025
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
Kaiwen Xue
Chenglong Li
Zhonghong Ou
Guoxin Zhang
Kaoyan Lu
...
Xinyu Liu
Qunlin Chen
Weiwei Qin
Yiran Shen
Jiayi Cen
68
0
0
17 Nov 2025
Explore How to Inject Beneficial Noise in MLLMs
Explore How to Inject Beneficial Noise in MLLMs
Ruishu Zhu
Sida Huang
Ziheng Jiao
Hongyuan Zhang
80
2
0
17 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLMMoEOSLMVLM
451
0
0
16 Nov 2025
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Jingqi Xu
Jingxi Lu
Chenghao Li
Sreetama Sarkar
Souvik Kundu
Peter A. Beerel
VLM
148
0
0
16 Nov 2025
BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections
BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections
Subin Varghese
Joshua Gao
Asad Ur Rahman
Vedhus Hoskere
100
0
0
16 Nov 2025
D$^{3}$ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
D3^{3}3ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
Shuochen Chang
Xiaofeng Zhang
Qingyang Liu
Li Niu
56
0
0
15 Nov 2025
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
Wenhao Zhou
Hao Zheng
R. Zhao
MLLMVLMLRM
132
0
0
14 Nov 2025
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
Z. Chen
Yuze Sun
Yuan Tian
Wenjun Zhang
Guangtao Zhai
ALMELM
116
0
0
12 Nov 2025
Learning with Preserving for Continual Multitask Learning
Learning with Preserving for Continual Multitask Learning
H. Wang
Siwoo Bae
Zirong Chen
Meiyi Ma
CLL
140
0
0
11 Nov 2025
RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation
RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation
Haofeng Wang
Yu Zhang
LRM
48
0
0
10 Nov 2025
Unveiling Modality Bias: Automated Sample-Specific Analysis for Multimodal Misinformation Benchmarks
Unveiling Modality Bias: Automated Sample-Specific Analysis for Multimodal Misinformation Benchmarks
Hehai Lin
Hui Liu
S. Cao
Jing Li
Haoliang Li
Wenya Wang
140
0
0
08 Nov 2025
Visual Spatial Tuning
Visual Spatial Tuning
Rui Yang
Ziyu Zhu
Yanwei Li
Jingjia Huang
Shen Yan
...
Xiangtai Li
S. Li
Wenqian Wang
Yi Lin
Hengshuang Zhao
VLM
285
4
0
07 Nov 2025
Cambrian-S: Towards Spatial Supersensing in Video
Cambrian-S: Towards Spatial Supersensing in Video
Shusheng Yang
J. Yang
Pinzhi Huang
Ellis L Brown
Zihao Yang
...
Daohan Lu
Rob Fergus
Yann LeCun
Li Fei-Fei
Saining Xie
76
8
0
06 Nov 2025
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Ellis L Brown
Jihan Yang
Shusheng Yang
Rob Fergus
Saining Xie
VLM
190
4
0
06 Nov 2025
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
Ali Faraz
Akash
Shaharukh Khan
Raja Kolla
Akshat Patidar
Suranjan Goswami
Abhinav Ravi
Chandra Khatri
Shubham Agarwal
VLM
112
0
0
06 Nov 2025
NVIDIA Nemotron Nano V2 VL
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
236
1
0
06 Nov 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Jingqi Tong
Yurong Mou
Hangcheng Li
Mingzhe Li
Y. Yang
...
Y. Zheng
Xinchi Chen
Jun Zhao
Xuanjing Huang
Xipeng Qiu
VGenLRM
281
6
0
06 Nov 2025
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity
Kaiyuan Zhang
Chenghao Yang
Zhoufutu Wen
Sihang Yuan
Q. Wang
...
Ge Zhang
Yi Lin
Guang Shi
Chaoyou Fu
Wenhao Huang
LRM
164
0
0
05 Nov 2025
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation
J. Park
Mu Cai
Feng Yao
Jingbo Shang
Soochahn Lee
Yong Jae Lee
AAMLVLM
56
0
0
05 Nov 2025
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
Kuei-Chun Kao
Hsu Tzu-Yin
Yunqi Hong
Ruochen Wang
Cho-Jui Hsieh
LRM
56
0
0
05 Nov 2025
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Tianfan Peng
Yuntao Du
Pengzhou Ji
Shijie Dong
Kailin Jiang
...
Jinhe Bi
Qian Li
Wei Du
Feng Xiao
Lizhen Cui
VLM
172
0
0
04 Nov 2025
CoCoVa: Chain of Continuous Vision-Language Thought for Latent Space Reasoning
CoCoVa: Chain of Continuous Vision-Language Thought for Latent Space Reasoning
Jizheng Ma
Xiaofei Zhou
Yanlong Song
Han Yan
VLMLRM
129
0
0
04 Nov 2025
Dynamic Reflections: Probing Video Representations with Text Alignment
Dynamic Reflections: Probing Video Representations with Text Alignment
Tyler Zhu
Tengda Han
Leonidas Guibas
Viorica Patraucean
M. Ovsjanikov
VGen
209
0
0
04 Nov 2025
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Fangxun Shu
Yongjie Ye
Yue Liao
Zijian Kang
Weijie Yin
Jiacong Wang
Xiao Liang
Shuicheng Yan
Chao Feng
OffRLReLMLRM
189
1
0
04 Nov 2025
Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models
Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models
Jay Mohta
Kenan E. Ak
Dimitrios Dimitriadis
Yan Xu
Mingwei Shen
CLLVLM
222
0
0
03 Nov 2025
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Yongyuan Liang
Wei Chow
Feng Li
Ziqiao Ma
Xiyao Wang
Jiageng Mao
Jiuhai Chen
Jiatao Gu
Y. Wang
Furong Huang
LRM
160
0
0
03 Nov 2025
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Ming Li
Jike Zhong
Shitian Zhao
H. Zhang
Shaoheng Lin
Yuxiang Lai
Chen Wei
Konstantinos Psounis
Kaipeng Zhang
EGVMLRMVLM
380
2
0
03 Nov 2025
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond
Fan Zhang
Haoxuan Li
Shengju Qian
Xin Wang
Zheng Lian
...
Yuan Gao
Qiankun Li
Yefeng Zheng
Zhouchen Lin
Pheng-Ann Heng
LRM
88
0
0
01 Nov 2025
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
Yuhong Liu
Beichen Zhang
Yuhang Zang
Yuhang Cao
Long Xing
Xiaoyi Dong
Haodong Duan
Dahua Lin
J. Wang
LRM
97
2
0
31 Oct 2025
MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models
MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models
Zimeng Huang
Jinxin Ke
Xiaoxuan Fan
Yufeng Yang
Yang Liu
...
Junteng Dai
Haoyi Jiang
Y. Zhou
Keze Wang
Z. Chen
LRMVLM
231
0
0
30 Oct 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLMLRM
355
2
0
28 Oct 2025
BLM$_1$: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
BLM1_11​: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
Wentao Tan
Bowen Wang
Heng Zhi
Chenyu Liu
Z. Li
...
Chen Xu
Zhibin Wang
Tianshi Wang
Lei Zhu
Heng Tao Shen
LM&Ro
79
0
0
28 Oct 2025
Revisiting Multimodal Positional Encoding in Vision-Language Models
Revisiting Multimodal Positional Encoding in Vision-Language Models
Jie Huang
Xuejing Liu
Sibo Song
Ruibing Hou
Hong Chang
Junyang Lin
S. Bai
112
1
0
27 Oct 2025
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Zeyu Wang
Z. Chen
Chenhui Gou
Feng Li
Chaorui Deng
...
Kunchang Li
Weihao Yu
Haoqin Tu
Haoqi Fan
Cihang Xie
202
0
0
27 Oct 2025
1234...121314
Next