ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06281
  4. Cited By
MMBench: Is Your Multi-modal Model an All-around Player?
v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 687 papers shown
Reinforcement Learning for Large Model: A Survey
Reinforcement Learning for Large Model: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
323
2
0
24 Dec 2025
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Reuben Tan
Baolin Peng
Zhengyuan Yang
Hao Cheng
Oier Mees
...
Xiaodong Liu
Lijuan Wang
Marc Pollefeys
Yong Jae Lee
Jianfeng Gao
OffRLLRM
193
1
0
03 Dec 2025
Jina-VLM: Small Multilingual Vision Language Model
Jina-VLM: Small Multilingual Vision Language Model
Andreas Koukounas
Georgios Mastrapas
Florian Hönicke
Sedigheh Eslami
Guillaume Roncari
Scott Martens
Han Xiao
MLLM
379
0
0
03 Dec 2025
ViDiC: Video Difference Captioning
ViDiC: Video Difference Captioning
J. Wu
S. Li
Zhaozhou Bian
J. Chen
Runzhe Wen
An Ping
Yiwen He
Jiakai Wang
Yuanxing Zhang
Jiaheng Liu
172
0
0
03 Dec 2025
V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention
V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention
Nan Sun
Zhenyu Zhang
Xixun Lin
Kun Wang
Yanmin Shang
...
Shuohuan Wang
Yu Sun
H. Wu
Haifeng Wang
Yanan Cao
MLLMVLM
144
0
0
03 Dec 2025
MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
Youxin Pang
Jiajun Liu
L. Tan
Yong Zhang
Feng Gao
Xiang Deng
Zhuoliang Kang
Xiaoming Wei
Y. Liu
VGen
130
0
0
02 Dec 2025
Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs
Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs
Julian Ma
Jun Wang
Zafeirios Fountas
45
0
0
02 Dec 2025
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
Zhongyu Yang
Dannong Xu
Wei Pang
Yingfang Yuan
VLM
193
0
0
01 Dec 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Le Thien Phuc Nguyen
Zhuoran Yu
Samuel Low Yu Hang
Subin An
J. Lee
...
SeungEun Chung
Thanh-Huy Nguyen
JuWan Maeng
Soochahn Lee
Yong Jae Lee
AuLLMVLM
201
3
0
01 Dec 2025
Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation
Chengzhi Yu
Yifan Xu
Yifan Chen
Wenyi Zhang
MLLMOffRL
280
1
0
30 Nov 2025
REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
Jacob Thompson
Emiliano Garcia-Lopez
Yonatan Bisk
LRM
137
0
0
30 Nov 2025
When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI
When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI
Yanhui Li
Qi Zhou
Zhihong Xu
Huizhong Guo
Wenhai Wang
Dongxia Wang
VLM
98
0
0
29 Nov 2025
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Jiazhen Liu
Mingkuan Feng
Long Chen
92
0
0
29 Nov 2025
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
224
1
0
28 Nov 2025
Visual Generation Tuning
Visual Generation Tuning
Jiahao Guo
Sinan Du
J. Yao
Wenyu Liu
Bo Li
Haoxiang Cao
Kun Gai
C. Yuan
Kai Wu
Xinggang Wang
VLM
304
0
0
28 Nov 2025
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
Eun Chang
Z. Huang
Yiwei Liao
Sagar Ravi Bhavsar
Amogh Param
...
Babak Damavandi
Rakesh Wanga
Anuj Kumar
Rohit Patel
Xin Luna Dong
79
0
0
27 Nov 2025
Unexplored flaws in multiple-choice VQA evaluations
Unexplored flaws in multiple-choice VQA evaluations
Fabio Rosenthal
Sebastian Schmidt
Thorsten Graf
Thorsten Bagodonat
Stephan Günnemann
Leo Schwinn
67
0
0
27 Nov 2025
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
Yiming Chen
Junlin Han
Tianyi Bai
Shengbang Tong
Filippos Kokkinos
Philip Torr
88
0
0
27 Nov 2025
CaptionQA: Is Your Caption as Useful as the Image Itself?
CaptionQA: Is Your Caption as Useful as the Image Itself?
Shijia Yang
Yunong Liu
Bohan Zhai
Ximeng Sun
Zicheng Liu
E. Barsoum
Manling Li
Chenfeng Xu
CoGe
204
0
0
26 Nov 2025
Object-Centric Vision Token Pruning for Vision Language Models
Object-Centric Vision Token Pruning for Vision Language Models
Guangyuan Li
R. Zhao
Jinhong Deng
Yanbo Wang
Joni Pajarinen
VLM
197
0
0
25 Nov 2025
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
Xiang Wang
Zhifei Zhang
Chentao Song
Zhe Lin
Yuqian Zhou
...
Haitian Zheng
Jason Kuen
Yuehuan Wang
Changxin Gao
Nong Sang
MoE
175
1
0
25 Nov 2025
UniGame: Turning a Unified Multimodal Model Into Its Own Adversary
UniGame: Turning a Unified Multimodal Model Into Its Own Adversary
Zhaolong Su
Wang Lu
Hao Chen
Sharon Li
Jindong Wang
166
0
0
24 Nov 2025
Robot-Powered Data Flywheels: Deploying Robots in the Wild for Continual Data Collection and Foundation Model Adaptation
Robot-Powered Data Flywheels: Deploying Robots in the Wild for Continual Data Collection and Foundation Model Adaptation
J. Grannen
Michelle Pan
Kenneth Llontop
Cherie Ho
Mark Zolotas
Jeannette Bohg
Dorsa Sadigh
LM&Ro
343
0
0
24 Nov 2025
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
Wengyi Zhan
Mingbao Lin
Zhihang Lin
Rongrong Ji
MLLMVLMLRM
227
0
0
24 Nov 2025
Self-Empowering VLMs: Achieving Hierarchical Consistency via Self-Elicited Knowledge Distillation
Self-Empowering VLMs: Achieving Hierarchical Consistency via Self-Elicited Knowledge Distillation
Wei Yang
Yiran Zhu
Zilin Li
Xunjia Zhang
Hongtao Wang
VLM
137
0
0
23 Nov 2025
ConsistCompose: Unified Multimodal Layout Control for Image Composition
ConsistCompose: Unified Multimodal Layout Control for Image Composition
Xuanke Shi
B. Li
Xiaoyang Han
Zhongang Cai
Lei Yang
Dahua Lin
Quan-ding Wang
MLLM
389
0
0
23 Nov 2025
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
Yuting Gao
Wang Lan
Hengyuan Zhao
Linjiang Huang
Si Liu
Q. Guo
MoE
168
0
0
23 Nov 2025
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
Guoyang Xia
Yifeng Ding
Fengfa Li
Lei Ren
Wei Chen
Fangxiang Feng
Xiaojie Wang
MoEVLM
187
0
0
22 Nov 2025
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
Jun Zhang
Jie Feng
Long Chen
Junhui Wang
Zhicheng Liu
Depeng Jin
Yong Li
LRM
127
0
0
22 Nov 2025
VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference
VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference
Ziyan Liu
Y. Chen
Hongyi Cai
Tao Lin
Shuo Yang
Zheng Liu
Bo Zhao
VLM
323
0
0
20 Nov 2025
Learning to Think Fast and Slow for Visual Language Models
Chenyu Lin
Cheng Chi
Jinlin Wu
Sharon Li
Kaiyang Zhou
ReLMVLM
226
0
0
20 Nov 2025
First Frame Is the Place to Go for Video Content Customization
First Frame Is the Place to Go for Video Content Customization
Jingxi Chen
Z. Li
Zhichao Liu
Guangyao Shi
Xiyang Wu
Fuxiao Liu
Cornelia Fermüller
Brandon Yushan Feng
Yiannis Aloimonos
DiffMVGen
204
0
0
19 Nov 2025
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Yushi Huang
Z. Wang
Zhihang Yuan
Yifu Ding
Ruihao Gong
Jinyang Guo
Xianglong Liu
Jun Zhang
MoEVLM
265
1
0
19 Nov 2025
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
Duo Li
Zuhao Yang
Xiaoqin Zhang
Ling Shao
Shijian Lu
VLM
157
1
0
19 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
341
1
0
19 Nov 2025
When to Think and When to Look: Uncertainty-Guided Lookback
When to Think and When to Look: Uncertainty-Guided Lookback
Jing Bi
Filippos Bellos
Junjia Guo
Yayuan Li
Chao Huang
...
Luchuan Song
Luchuan Song
Susan Liang
Zhongfei
Zhang
LRM
290
0
0
19 Nov 2025
FlexiCup: Wireless Multimodal Suction Cup with Dual-Zone Vision-Tactile Sensing
FlexiCup: Wireless Multimodal Suction Cup with Dual-Zone Vision-Tactile Sensing
Junhao Gong
Shoujie Li
Kit-Wa Sou
Changqing Guo
Hourong Huang
...
Yifan Xie
Chenxin Liang
Chuqiao Lyu
Xiaojun Liang
Wenbo Ding
153
2
0
18 Nov 2025
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
N Dinesh Reddy
Dylan Snyder
Lona Kiragu
Mirajul Mohin
Shahrear Bin Amin
Sudeep Pillai
99
0
0
18 Nov 2025
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
Kaiwen Xue
Chenglong Li
Zhonghong Ou
Guoxin Zhang
Kaoyan Lu
...
Xinyu Liu
Qunlin Chen
Weiwei Qin
Yiran Shen
Jiayi Cen
129
0
0
17 Nov 2025
Explore How to Inject Beneficial Noise in MLLMs
Explore How to Inject Beneficial Noise in MLLMs
Ruishu Zhu
Sida Huang
Ziheng Jiao
Hongyuan Zhang
210
5
0
17 Nov 2025
BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections
BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections
Subin Varghese
Joshua Gao
Asad Ur Rahman
Vedhus Hoskere
165
0
0
16 Nov 2025
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Jingqi Xu
Jingxi Lu
Chenghao Li
Sreetama Sarkar
Souvik Kundu
Peter A. Beerel
VLM
181
0
0
16 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLMMoEOSLMVLM
625
1
0
16 Nov 2025
D$^{3}$ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
D3^{3}3ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
Shuochen Chang
Xiaofeng Zhang
Qingyang Liu
Li Niu
84
0
0
15 Nov 2025
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
Wenhao Zhou
Hao Zheng
R. Zhao
MLLMVLMLRM
170
0
0
14 Nov 2025
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
Z. Chen
Yuze Sun
Yuan Tian
Wenjun Zhang
Guangtao Zhai
ALMELM
234
1
0
12 Nov 2025
Learning with Preserving for Continual Multitask Learning
Learning with Preserving for Continual Multitask Learning
H. Wang
Siwoo Bae
Zirong Chen
Meiyi Ma
CLL
197
0
0
11 Nov 2025
RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation
RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation
Haofeng Wang
Yu Zhang
LRM
93
0
0
10 Nov 2025
Unveiling Modality Bias: Automated Sample-Specific Analysis for Multimodal Misinformation Benchmarks
Unveiling Modality Bias: Automated Sample-Specific Analysis for Multimodal Misinformation Benchmarks
Hehai Lin
Hui Liu
S. Cao
Jing Li
Haoliang Li
Wenya Wang
232
0
0
08 Nov 2025
Visual Spatial Tuning
Visual Spatial Tuning
Rui Yang
Ziyu Zhu
Yanwei Li
Jingjia Huang
Shen Yan
...
Xiangtai Li
S. Li
Wenqian Wang
Yi Lin
Hengshuang Zhao
VLM
347
7
0
07 Nov 2025
1234...121314
Next
Page 1 of 14
Pageof 14