ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06281
  4. Cited By
MMBench: Is Your Multi-modal Model an All-around Player?
v1v2v3v4 (latest)

MMBench: Is Your Multi-modal Model an All-around Player?

European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "MMBench: Is Your Multi-modal Model an All-around Player?"

50 / 672 papers shown
Title
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams
Pengfei Zhou
Xiaopeng Peng
Fanrui Zhang
Zhaopan Xu
Jiaxin Ai
...
Kai Wang
Xiaojun Chang
Wenqi Shao
Yang You
Kaipeng Zhang
ELM
68
3
0
09 Aug 2025
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen
Ruihui Zhao
Chuwei Luo
Mingze Sun
Xinlei Yu
Yangyang Kang
Ruqi Huang
LRM
197
4
0
08 Aug 2025
Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models
Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models
Huanyu Wang
Jushi Kai
Haoli Bai
Lu Hou
Bo Jiang
Ziwei He
Zhouhan Lin
VLM
94
0
0
08 Aug 2025
$Δ$-AttnMask: Attention-Guided Masked Hidden States for Efficient Data Selection and Augmentation
ΔΔΔ-AttnMask: Attention-Guided Masked Hidden States for Efficient Data Selection and Augmentation
Jucheng Hu
Suorong Yang
Dongzhan Zhou
92
0
0
08 Aug 2025
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang
Runsen Xu
Chenhang Cui
Tai Wang
Dahua Lin
Jiangmiao Pang
108
2
0
07 Aug 2025
Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object Completion
Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object CompletionInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Qingguo Hu
Ante Wang
Jia Song
Delai Qiu
Qingsong Liu
Jinsong Su
VLMLRM
98
1
0
06 Aug 2025
Training-Free Multimodal Large Language Model Orchestration
Training-Free Multimodal Large Language Model Orchestration
Tianyu Xie
Yuhang Wu
Yongdong Luo
Jinfa Huang
Xiawu Zheng
120
0
0
06 Aug 2025
Beyond the Visible: Benchmarking Occlusion Perception in Multimodal Large Language Models
Beyond the Visible: Benchmarking Occlusion Perception in Multimodal Large Language Models
Zhaochen Liu
Kaiwen Gao
Shuyi Liang
Bin Xiao
Limeng Qiao
Lin Ma
Tingting Jiang
126
2
0
06 Aug 2025
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang
Haihong E
Jiacheng Liu
Zhongjun Yang
Rongjin Li
...
Yiling Huang
Xinyi Hu
Qing Huang
Zijian Xie
Shiyao Peng
140
1
0
06 Aug 2025
OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing
OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing
Fuqing Bie
Shiyu Huang
Xijia Tao
Zhiqin Fang
Leyi Pan
Junzhe Chen
Min Ren
Liuyu Xiang
Zhaofeng He
140
0
0
06 Aug 2025
X-SAM: From Segment Anything to Any Segmentation
X-SAM: From Segment Anything to Any Segmentation
Hao Wang
Limeng Qiao
Zequn Jie
Zhijian Huang
Chengjian Feng
Qingfang Zheng
Lin Ma
X. Lan
Xiaodan Liang
VLM
113
5
0
06 Aug 2025
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions
Ziteng Wang
Siqi Yang
Limeng Qiao
Lin Ma
VLM
341
0
0
04 Aug 2025
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models
Mingyu Fu
Wei Suo
Ji Ma
Lin Yuanbo Wu
Peng Wang
Yanning Zhang
VLM
142
1
0
02 Aug 2025
HiPrune: Training-Free Visual Token Pruning via Hierarchical Attention in Vision-Language Models
Jizhihui Liu
Feiyi Du
Guangdao Zhu
Niu Lian
Jun Li
Bin Chen
VLM
110
1
0
01 Aug 2025
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
Hao Tang
Chenwei Xie
Xiaoyi Bao
Tingyu Weng
P. Li
Yun Zheng
Liwei Wang
158
10
0
31 Jul 2025
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko
Ji Soo Lee
M. Choi
Zihang Meng
Hyunwoo J. Kim
292
1
0
31 Jul 2025
Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers
Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers
Ji Ma
Wei Suo
Peng Wang
Yanning Zhang
VLM
101
1
0
31 Jul 2025
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
Zhixiang Wei
Guangting Wang
Xiaoxiao Ma
Ke Mei
Huajun Chen
Yi-jing Jin
Fengyun Rao
CLIPMLLMVLM
149
5
0
30 Jul 2025
Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI
Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI
Dong Xue
Ziyao Shao
Zhaoyang Duan
Fangzhou Liu
Bing Li
Zhongheng Zhang
LM&MA
90
0
0
30 Jul 2025
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng
Y. Wang
Yeyao Ma
Chen Li
Yongming Rao
...
Han Hu
Xiaosong Zhang
Linus
Di Wang
Jie Jiang
154
27
0
29 Jul 2025
MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces
MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic SpacesInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Shaojun E
Yuchen Yang
Jiaheng Wu
Yan Zhang
Tiejun Zhao
Ziyan Chen
162
0
0
29 Jul 2025
TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model
TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model
Ao Li
Yuxiang Duan
Jinghui Zhang
Congbo Ma
Yutong Xie
G. Carneiro
Mohammad Yaqub
Hu Wang
103
0
0
28 Jul 2025
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu
Yaoming Wang
Bowen Shi
Xiaopeng Zhang
Wenrui Dai
Chenglin Li
Hongkai Xiong
Qi Tian
116
1
0
28 Jul 2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Kele Shao
Keda Tao
Kejia Zhang
Sicheng Feng
Mu Cai
Yuzhang Shang
Haoxuan You
Can Qin
Yang Sui
Huan Wang
481
10
0
27 Jul 2025
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
X. Wang
Zhenyu Wu
JingJing Xie
Zichen Ding
Bowen Yang
...
Weijie Su
X. Zhu
Wei Shen
Jifeng Dai
Wenhai Wang
LLMAG
234
18
0
25 Jul 2025
Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models
Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models
Mohamad Ballout
Serwan Jassim
Elia Bruni
106
0
0
22 Jul 2025
MMCircuitEval: A Comprehensive Multimodal Circuit-Focused Benchmark for Evaluating LLMs
MMCircuitEval: A Comprehensive Multimodal Circuit-Focused Benchmark for Evaluating LLMs
Chenchen Zhao
Z. Shi
Xiangyu Wen
Chengjie Liu
Yi Liu
...
Yibo Lin
Jun Yang
Ning Xu
Xi Wang
Qiang Xu
103
3
0
20 Jul 2025
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Zhang Li
Biao Yang
Qiang Liu
Shuo Zhang
Zhiyin Ma
Liang Yin
Linger Deng
Yabo Sun
Yuliang Liu
Xiang Bai
392
0
0
08 Jul 2025
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong
S. Dong
Jin Wang
Jing Huang
Li Zhou
Zenghui Sun
Lihua Jing
Jingsong Lan
Xiaoyong Zhu
Bo Zheng
MLLM
216
3
0
07 Jul 2025
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Yana Wei
Liang Zhao
Jianjian Sun
Kangheng Lin
Jisheng Yin
...
Qi Han
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Vishal M. Patel
OffRLReLMLRMVLM
191
12
0
07 Jul 2025
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
Rui Yu
J. Zhang
Zhenye Gan
Qingdong He
Xiaobin Hu
...
Chengjie Wang
Zhucun Xue
Chaoyou Fu
Xinwei He
Xiang Bai
VLM
97
0
0
07 Jul 2025
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Yizhou Wang
Song Mao
Yang Chen
Yufan Shen
Yinqiao Yan
...
Botian Shi
Guohang Yan
Zhi Yu
Xuming Hu
Ding Wang
143
3
0
04 Jul 2025
PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning
PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning
Tatsuki Kawakami
Kazuki Egashira
Atsuyuki Miyai
Go Irie
Kiyoharu Aizawa
MU
327
1
0
02 Jul 2025
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-V Team
Wenyi Hong
Wenmeng Yu
Xiaohan Zhang
G. Wang
...
Bin Xu
J. Li
Minlie Huang
Yuxiao Dong
Jie Tang
MLLMReLMLRMVLM
517
11
0
01 Jul 2025
Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
Amirmohammad Izadi
Mohammad Ali Banayeeanzade
Fatemeh Askari
Ali Rahimiakbar
Mohammad Mahdi Vahedi
Hosein Hasani
M. Baghshah
LRM
182
1
0
27 Jun 2025
Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment
Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment
Rui Xu
Yunke Wang
Yong Luo
Bo Du
VLM
170
1
0
27 Jun 2025
OmniGen2: Exploration to Advanced Multimodal Generation
OmniGen2: Exploration to Advanced Multimodal Generation
Chenyuan Wu
PengFei Zheng
Ruiran Yan
Shitao Xiao
Xin Luo
...
Defu Lian
X. Wang
Zhongyuan Wang
Tiejun Huang
Zheng Liu
MLLMSyDaVLM
224
150
0
23 Jun 2025
HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
Yimu Wang
Mozhgan Nasr Azadani
Sean Sedwards
Krzysztof Czarnecki
VLM
120
1
0
23 Jun 2025
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Visual-Instructed Degradation Diffusion for All-in-One Image RestorationComputer Vision and Pattern Recognition (CVPR), 2025
Wenyang Luo
Haina Qin
Zewen Chen
L. xilinx Wang
Dandan Zheng
Yuming Li
Yufan Liu
B. Li
Weiming Hu
217
7
0
20 Jun 2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
Tongtian Yue
Longteng Guo
Yepeng Tang
Zijia Zhao
Xinxin Zhu
Hua Huang
Jing Liu
MLLMVLM
150
2
0
20 Jun 2025
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
Fan Yang
Yousong Zhu
Xin Li
Yufei Zhan
Hongyin Zhao
Shurong Zheng
Yaowei Wang
Ming Tang
Jinqiao Wang
MLLMVLM
214
0
0
20 Jun 2025
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
287
2
0
18 Jun 2025
Show-o2: Improved Native Unified Multimodal Models
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie
Zhenheng Yang
Mike Zheng Shou
VGen
415
81
0
18 Jun 2025
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria
Adinath Madhavrao Dukre
Feilong Tang
Sara Atito
Sudipta Roy
Muhammad Awais
Muhammad Haris Khan
Imran Razzak
VLM
231
0
0
18 Jun 2025
Context-Informed Grounding Supervision
Context-Informed Grounding Supervision
Hyunji Lee
Seunghyun Yoon
Yunjae Won
Hanseok Oh
Geewook Kim
Trung H. Bui
Franck Dernoncourt
Elias Stengel-Eskin
Mohit Bansal
Minjoon Seo
LRM
234
2
0
18 Jun 2025
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks
Zijian Song
Xiaoxin Lin
Qiuming Huang
Guangrun Wang
Liang Lin
LRM
328
5
0
17 Jun 2025
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Shaolei Zhang
Shoutao Guo
Qingkai Fang
Yan Zhou
Yang Feng
MLLMAuLLMVLM
230
8
0
16 Jun 2025
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Jianlong Wu
Sihao Liu
Chuan Rao
Bang An
Tiancheng Shen
Juil Sock
Ming-Hsuan Yang
Bernard Ghanem
204
4
0
16 Jun 2025
Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling
Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling
Shengwu.Xiong
Tianyu.Zou
Cong.Wang
Xuelong Li
66
0
0
13 Jun 2025
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang
Mengzhen Liu
Lichen Li
Ming Lu
Yuan Zhang
Junwen Pan
Qi She
Shanghang Zhang
VLM
364
16
0
12 Jun 2025
Previous
123456...121314
Next