Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.06281
Cited By
v1
v2
v3
v4 (latest)
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (5 upvotes)
Papers citing
"MMBench: Is Your Multi-modal Model an All-around Player?"
50 / 672 papers shown
Title
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
Kuei-Chun Kao
Hsu Tzu-Yin
Yunqi Hong
Ruochen Wang
Cho-Jui Hsieh
LRM
116
0
0
05 Nov 2025
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Fangxun Shu
Yongjie Ye
Yue Liao
Zijian Kang
Weijie Yin
Jiacong Wang
Xiao Liang
Shuicheng Yan
Chao Feng
OffRL
ReLM
LRM
217
1
0
04 Nov 2025
CoCoVa: Chain of Continuous Vision-Language Thought for Latent Space Reasoning
Jizheng Ma
Xiaofei Zhou
Yanlong Song
Han Yan
VLM
LRM
153
1
0
04 Nov 2025
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Tianfan Peng
Yuntao Du
Pengzhou Ji
Shijie Dong
Kailin Jiang
...
Jinhe Bi
Qian Li
Wei Du
Feng Xiao
Lizhen Cui
VLM
232
0
0
04 Nov 2025
Dynamic Reflections: Probing Video Representations with Text Alignment
Tyler Zhu
Tengda Han
Leonidas Guibas
Viorica Patraucean
M. Ovsjanikov
VGen
233
0
0
04 Nov 2025
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Ming Li
Jike Zhong
Shitian Zhao
H. Zhang
Shaoheng Lin
Yuxiang Lai
Chen Wei
Konstantinos Psounis
Kaipeng Zhang
EGVM
LRM
VLM
428
2
0
03 Nov 2025
Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models
Jay Mohta
Kenan E. Ak
Dimitrios Dimitriadis
Yan Xu
Mingwei Shen
CLL
VLM
254
0
0
03 Nov 2025
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Yongyuan Liang
Wei Chow
Feng Li
Ziqiao Ma
Xiyao Wang
Jiageng Mao
Jiuhai Chen
Jiatao Gu
Y. Wang
Furong Huang
LRM
212
1
0
03 Nov 2025
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond
Fan Zhang
Haoxuan Li
Shengju Qian
Xin Wang
Zheng Lian
...
Yuan Gao
Qiankun Li
Yefeng Zheng
Zhouchen Lin
Pheng-Ann Heng
LRM
108
0
0
01 Nov 2025
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
Yuhong Liu
Beichen Zhang
Yuhang Zang
Yuhang Cao
Long Xing
Xiaoyi Dong
Haodong Duan
Dahua Lin
J. Wang
LRM
137
3
0
31 Oct 2025
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLM
VLM
490
2
0
31 Oct 2025
MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models
Zimeng Huang
Jinxin Ke
Xiaoxuan Fan
Yufeng Yang
Yang Liu
...
Junteng Dai
Haoyi Jiang
Y. Zhou
Keze Wang
Z. Chen
LRM
VLM
307
0
0
30 Oct 2025
BLM
1
_1
1
: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
Wentao Tan
Bowen Wang
Heng Zhi
Chenyu Liu
Z. Li
...
Chen Xu
Zhibin Wang
Tianshi Wang
Lei Zhu
Heng Tao Shen
LM&Ro
151
0
0
28 Oct 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLM
LRM
431
2
0
28 Oct 2025
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Zeyu Wang
Z. Chen
Chenhui Gou
Feng Li
Chaorui Deng
...
Kunchang Li
Weihao Yu
Haoqin Tu
Haoqi Fan
Cihang Xie
270
0
0
27 Oct 2025
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
Xin Jin
Siyuan Li
Siyong Jian
Kai Yu
Huan Wang
116
0
0
27 Oct 2025
Revisiting Multimodal Positional Encoding in Vision-Language Models
Jie Huang
Xuejing Liu
Sibo Song
Ruibing Hou
Hong Chang
Junyang Lin
S. Bai
124
1
0
27 Oct 2025
PISA-Bench: The PISA Index as a Multilingual and Multimodal Metric for the Evaluation of Vision-Language Models
Patrick Haller
Fabio Barth
Jonas Golde
Georg Rehm
Alan Akbik
LRM
333
0
0
27 Oct 2025
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models
Mahiro Ukai
Shuhei Kurita
Nakamasa Inoue
CoGe
201
0
0
26 Oct 2025
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments
Weijie Zhou
Xuantang Xiong
Yi Peng
Manli Tao
Chaoyang Zhao
Honghui Dong
Ming Tang
Jinqiao Wang
LRM
121
1
0
24 Oct 2025
Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study
Guanlin Wu
Boyan Su
Yang Zhao
Pu Wang
Yichen Lin
Hao Frank Yang
104
0
0
24 Oct 2025
KBE-DME: Dynamic Multimodal Evaluation via Knowledge Enhanced Benchmark Evolution
Junzhe Zhang
Huixuan Zhang
Xiaojun Wan
49
0
0
24 Oct 2025
NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation
Longtian Qiu
Shan Ning
Jiaxuan Sun
Xuming He
NoLa
OffRL
LRM
388
0
0
24 Oct 2025
Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning
Xiaohan Lan
Fanfan Liu
Haibo Qiu
Siqi Yang
Delian Ruan
Peng Shi
Lin Ma
MoE
LRM
191
0
0
23 Oct 2025
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Guanghao Zheng
Bowen Shi
Mingxing Xu
Ruoyu Sun
Peisen Zhao
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Xiaopeng Zhang
Qi Tian
VLM
135
0
0
23 Oct 2025
HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models
Zelin Peng
Zhengqin Xu
Qingyang Liu
Xiaokang Yang
Wei Shen
161
0
0
23 Oct 2025
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints
Kailin Jiang
Hongbo Jiang
Ning Jiang
Zhi Gao
Jinhe Bi
Yuchen Ren
B. Li
Yuntao Du
L. J. Liu
Qing Li
CLL
OffRL
KELM
VLM
199
1
0
22 Oct 2025
Unified Reinforcement and Imitation Learning for Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
148
1
0
22 Oct 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
Xiaoxing Hu
Kaicheng Yang
Ziyang Gong
Qi Ming
Zonghao Guo
Xiang An
Ziyong Feng
Junchi Yan
Xue Yang
CLIP
VLM
195
0
0
21 Oct 2025
VAR: Visual Attention Reasoning via Structured Search and Backtracking
Wei Cai
Jian Zhao
Yuchen Yuan
T. Zhang
Ming Zhu
Haichuan Tang
Chi Zhang
Xuelong Li
OffRL
LRM
108
0
0
21 Oct 2025
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
Da Zhang
Chenggang Rong
Bingyu Li
Feiyu Wang
Zhiyuan Zhao
Junyu Gao
Xuelong Li
VLM
CoGe
204
0
0
21 Oct 2025
Token-Level Inference-Time Alignment for Vision-Language Models
Kejia Chen
Jiawen Zhang
Jiacong Hu
Kewei Gao
Jian Lou
Zunlei Feng
Mingli Song
MLLM
VLM
249
0
0
20 Oct 2025
UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts
Fu-Yun Wang
Han Zhang
Michael Gharbi
Hongsheng Li
Taesung Park
130
0
0
20 Oct 2025
Accelerating Vision Transformers with Adaptive Patch Sizes
Rohan Choudhury
JungEun Kim
Jeongseok Lee
Eunho Yang
László A. Jeni
Kishore Venkateshan
ViT
108
1
0
20 Oct 2025
V
i
s
i
P
r
u
n
e
r
\mathcal{V}isi\mathcal{P}runer
V
i
s
i
P
r
u
n
er
: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
Yingqi Fan
Anhao Zhao
Jinlan Fu
Junlong Tong
Hui Su
Yijie Pan
Wei Zhang
Xiaoyu Shen
VLM
80
2
0
20 Oct 2025
Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs
Jiazhen Liu
Long Chen
MLLM
VLM
140
2
0
19 Oct 2025
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
Jiaying Zhu
Yurui Zhu
Xin Lu
Wenrui Yan
Dong Li
Kunlin Liu
Xueyang Fu
Zheng-Jun Zha
MQ
VLM
223
0
0
18 Oct 2025
RL makes MLLMs see better than SFT
Junha Song
Sangdoo Yun
Dongyoon Han
Jaegul Choo
Byeongho Heo
OffRL
179
0
0
18 Oct 2025
SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
Xiaojun Guo
Runyu Zhou
Yifei Wang
Qi Zhang
Chenheng Zhang
...
Xiaohan Wang
Jiajun Chai
Guojun Yin
Wei Lin
Y. Wang
LRM
VLM
132
2
0
18 Oct 2025
Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts
Yongxiang Hua
H. Cao
Zhou Tao
Bocheng Li
Zihao Wu
Chaohu Liu
Linli Xu
MoE
192
0
0
18 Oct 2025
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee
Byung-Kwan Lee
Jianshu Zhang
Yechan Hwang
ByungSoo Ko
...
Xuankun Rong
Eojin Joo
Seung-Ho Han
Bowon Ko
Ho-Jin Choi
LRM
105
1
0
18 Oct 2025
Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
Minh Khoi Nguyen Nhat
R. Teo
Laziz U. Abdullaev
Maurice Mok
Viet-Hoang Tran
T. Nguyen
MoE
158
0
0
18 Oct 2025
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Xingrui Wang
Jiang Liu
Chao Huang
X. Yu
Ze Wang
Ximeng Sun
Jialian Wu
Alan Yuille
Emad Barsoum
Zicheng Liu
VLM
71
0
0
16 Oct 2025
Train a Unified Multimodal Data Quality Classifier with Synthetic Data
Weizhi Wang
Rongmei Lin
Shiyang Li
Colin Lockard
Ritesh Sarkhel
Sanket Lokegaonkar
Jingbo Shang
Xifeng Yan
Nasser Zalmout
Xian Li
88
0
0
16 Oct 2025
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjD
LRM
VLM
205
1
0
16 Oct 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
124
0
0
16 Oct 2025
Vision-Centric Activation and Coordination for Multimodal Large Language Models
Yunnan Wang
Fan Lu
Kecheng Zheng
Ziyuan Huang
Ziqiang Li
Wenjun Zeng
Xin Jin
MLLM
316
0
0
16 Oct 2025
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
Logan Lawrence
Oindrila Saha
Megan Wei
Chen Sun
Subhransu Maji
Grant Van Horn
132
0
0
16 Oct 2025
VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models
Dominick Reilly
Manish Kumar Govind
Le Xue
Srijan Das
VLM
116
0
0
15 Oct 2025
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Run Luo
Xiaobo Xia
Lu Wang
Longze Chen
Renke Shan
Jing Luo
Min Yang
Tat-Seng Chua
VGen
220
4
0
15 Oct 2025
Previous
1
2
3
4
5
...
12
13
14
Next