Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.06281
Cited By
v1
v2
v3
v4 (latest)
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (5 upvotes)
Papers citing
"MMBench: Is Your Multi-modal Model an All-around Player?"
50 / 687 papers shown
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang
Pinxin Liu
Mingqian Feng
Mingqian Feng
Rui Mao
...
Hang Hua
Ali Vosoughi
Luchuan Song
Zeliang Zhang
Chenliang Xu
LRM
473
4
0
26 May 2025
Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
Xinmiao Hu
C. Wang
Ruihe An
ChenYu Shao
Xiaojun Ye
Sheng Zhou
Liangcheng Li
MLLM
LRM
286
2
0
26 May 2025
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang
Changle Zhou
Jiawei Kong
Kuofeng Gao
Bin Chen
Tao Liang
MLLM
444
6
0
26 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
396
20
0
26 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
465
8
0
25 May 2025
Caption This, Reason That: VLMs Caught in the Middle
Zihan Weng
Lucas Gomez
Taylor Whittington Webb
P. Bashivan
VLM
LRM
384
0
0
24 May 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
298
3
0
24 May 2025
MLLMs are Deeply Affected by Modality Bias
Xu Zheng
Chenfei Liao
Yuqian Fu
Kaiyu Lei
Yuanhuiyi Lyu
...
Yu Jiang
Andrii Zadaianchuk
Dacheng Tao
Luc Van Gool
Xuming Hu
332
12
0
24 May 2025
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
Duo Li
Zuhao Yang
Xiaoqin Zhang
Ling Shao
Shijian Lu
VLM
502
1
0
24 May 2025
CAS-IQA: Teaching Vision-Language Models for Synthetic Angiography Quality Assessment
Bo Wang
De-Xing Huang
Xiao-Hu Zhou
Mei-Jiang Gui
Nu-Fang Xiao
Jian-Long Hao
Ming-Yuan Liu
Zeng-Guang Hou
210
0
0
23 May 2025
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yue Li
Xin Yi
Dongsheng Shi
Gerard de Melo
Xiaoling Wang
Linlin Wang
349
0
0
22 May 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Zebin You
Shen Nie
Xiaolu Zhang
Jun Hu
Jun Zhou
Zhiwu Lu
J. Wen
Chongxuan Li
MLLM
VLM
440
67
0
22 May 2025
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
Computer Vision and Pattern Recognition (CVPR), 2025
Feilong Tang
Chengzhi Liu
Zhongxing Xu
Ming Hu
Zelin Peng
...
Minquan Lin
Yifan Peng
Xuelian Cheng
Imran Razzak
Zongyuan Ge
303
20
0
22 May 2025
Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models
Chengcheng Wang
Jianyuan Guo
Hongguang Li
Yuchuan Tian
Ying Nie
Chang Xu
Kai Han
295
3
0
22 May 2025
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
365
47
0
22 May 2025
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Penghao Wu
Lewei Lu
Ziwei Liu
285
1
0
21 May 2025
ModRWKV: Transformer Multimodality in Linear Time
Jiale Kang
Ziyin Yue
Qingyu Yin
Jiang Rui
W. Li
Zening Lu
Zhouran Ji
OffRL
234
0
0
20 May 2025
VoQA: Visual-only Question Answering
Jianing An
Luyang Jiang
Jie Luo
Wenjun Wu
Lei Huang
LRM
348
0
0
20 May 2025
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
Jiaer Xia
Yuhang Zang
Peng Gao
Shouqing Yang
Kaiyang Zhou
OffRL
ReLM
AI4TS
VLM
LRM
333
41
0
20 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Mingyu Ding
LRM
314
18
0
19 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Xuetao Zhang
LRM
531
18
0
18 May 2025
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Bonan li
Zicheng Zhang
Songhua Liu
Weihao Yu
Xinchao Wang
VLM
334
2
0
17 May 2025
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Yansheng Qiu
Li Xiao
Zhaopan Xu
Pengfei Zhou
Zheng Wang
Jianchao Tan
ELM
LRM
457
1
0
16 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Elodie Germani
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Shadi Albarqouni
AI4CE
339
3
0
14 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
282
2
0
11 May 2025
Emotion-Qwen: A Unified Framework for Emotion and Vision Understanding
Dawei Huang
Qing Li
Chuan Yan
Minghan Li
Jiaming Ji
...
Xiaobei Wang
X. Wang
Zheng Lian
Zhi-Qi Cheng
Xiaojiang Peng
316
1
0
10 May 2025
SITE: towards Spatial Intelligence Thorough Evaluation
Wenjie Wang
Reuben Tan
Pengyue Zhu
Jianwei Yang
Zhengyuan Yang
Lijuan Wang
Andrey Kolobov
Jianfeng Gao
Boqing Gong
293
6
0
08 May 2025
Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding
Jaehyun Jeon
Janghan Yoon
Minsoo Kim
Sumin Shim
Yejin Choi
Hanbin Kim
Youngjae Yu
Youngjae Yu
AAML
567
0
0
08 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Gang Qu
Zhenan Sun
Mingyu Ding
MLLM
VLM
447
32
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
604
34
0
08 May 2025
Multi-Agent System for Comprehensive Soccer Understanding
Jiayuan Rao
Zhiyu Li
Haoning Wu
Yujiao Shi
Yanfeng Wang
Weidi Xie
LLMAG
387
7
0
06 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
585
10
0
05 May 2025
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen
Runmin Cong
Yuzhi Zhao
Hongzheng Yang
Guangneng Hu
H. Ip
Sam Kwong
CLL
KELM
361
7
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.2K
32
0
05 May 2025
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
Siqi Li
Yufan Shen
Xiangnan Chen
Jiayi Chen
Hengwei Ju
...
Botian Shi
Y. Liu
Xinyu Cai
Yu Qiao
Yu Qiao
VLM
ELM
584
2
0
30 Apr 2025
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Yiming Lei
Chenkai Zhang
Ziqiang Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
AI4TS
529
0
0
30 Apr 2025
Multimodal Language Models See Better When They Look Shallower
Wei Xu
Junyan Lin
Xinhao Chen
Yue Fan
Jianfeng Dong
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
356
4
0
30 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Computer Vision and Pattern Recognition (CVPR), 2025
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
536
3
0
29 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
525
4
0
28 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
International Conference on Learning Representations (ICLR), 2025
Yiyang Zhou
Zhaoxiang Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Joey Tianyi Zhou
Huaxiu Yao
445
10
0
27 Apr 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Jing Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Yongbin Li
Jun Zhu
MLLM
1.1K
3
0
25 Apr 2025
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
Feng Chen
Yefei He
Lequan Lin
Qingbin Liu
Bohan Zhuang
Qi Wu
Qi Wu
365
1
0
23 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Zehao Wang
Senthil Purushwalkam
Caiming Xiong
Siyang Song
Chenhui Xu
Ran Xu
400
5
0
23 Apr 2025
Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi R. Fung
630
0
0
23 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
402
23
0
20 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Mengli Zhu
Weidong Zhang
...
Pengfei Li
Chong Li
Junhao Zhu
Hao Yang
Shiliang Sun
475
2
0
16 Apr 2025
Benchmarking Vision Language Models on German Factual Data
Artificial Intelligence Applications and Innovations (AIAI), 2025
René Peinl
Vincent Tischler
CoGe
345
1
0
15 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
378
3
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
672
829
1
14 Apr 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
Jianfei Chen
Jingwei Xu
Tengjiao Wang
Bin Wang
Wentao Zhang
MLLM
478
3
0
14 Apr 2025
Previous
1
2
3
...
6
7
8
...
12
13
14
Next
Page 7 of 14
Page
of 14
Go