Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.06281
Cited By
v1
v2
v3
v4 (latest)
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (5 upvotes)
Papers citing
"MMBench: Is Your Multi-modal Model an All-around Player?"
50 / 672 papers shown
Title
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
609
31
0
27 May 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
323
64
0
26 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Wanrong Zhu
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
426
62
0
24 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
International Conference on Learning Representations (ICLR), 2024
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao Lin
Tao Lin
MoE
444
28
0
23 May 2024
C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning
Ji Ma
Wei Suo
Peng Wang
Yanning Zhang
VLM
229
0
0
21 May 2024
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Jiaqi Li
Qianshan Wei
Chuanyi Zhang
Guilin Qi
Miaozeng Du
Yongrui Chen
Sheng Bi
Fan Liu
VLM
MU
436
30
0
21 May 2024
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Zhenwei Shao
Zhou Yu
Jun Yu
Xuecheng Ouyang
Lihao Zheng
Zhenbiao Gai
Mingyang Wang
Jiajun Ding
207
23
0
20 May 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
253
84
0
17 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
377
27
0
08 May 2024
What matters when building vision-language models?
Neural Information Processing Systems (NeurIPS), 2024
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
280
270
0
03 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max Ku
Qian Liu
Wenhu Chen
VLM
MLLM
346
178
0
02 May 2024
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
Yoonsik Kim
Moonbin Yim
Ka Yeon Song
LMTD
284
40
0
30 Apr 2024
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models
Wei Li
Ren Ma
Jiang Wu
Chenya Gu
Jiahui Peng
Jinyang Len
Songyang Zhang
Hang Yan
Dahua Lin
Conghui He
ELM
138
1
0
29 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
576
303
0
29 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
458
961
0
25 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
300
73
0
24 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
548
1,840
0
22 Apr 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
461
64
0
22 Apr 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
333
228
0
22 Apr 2024
UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
Zhaokun Zhou
Qiulin Wang
Bin Lin
Yiwei Su
Ruoxin Chen
Xin Tao
Amin Zheng
Li-xin Yuan
Pengfei Wan
Di Zhang
123
29
0
15 Apr 2024
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Ya-Qi Yu
Minghui Liao
Jihao Wu
Yongxin Liao
Xiaoyu Zheng
Wei Zeng
VLM
197
21
0
14 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Shiyang Feng
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Jiaming Song
VLM
343
84
0
29 Mar 2024
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models
Zuyan Liu
Yuhao Dong
Yongming Rao
Jie Zhou
Jiwen Lu
LRM
186
41
0
19 Mar 2024
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
European Conference on Computer Vision (ECCV), 2024
Yifan Li
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
449
90
0
14 Mar 2024
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
European Conference on Computer Vision (ECCV), 2024
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
283
96
0
14 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
401
627
0
08 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
278
4
0
07 Mar 2024
TempCompass: Do Video LLMs Really Understand Videos?
Yuanxin Liu
Shicheng Li
Yi Liu
Yuxiang Wang
Shuhuai Ren
Lei Li
Sishuo Chen
Xu Sun
Lu Hou
VLM
410
211
0
01 Mar 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRM
ELM
386
4
0
28 Feb 2024
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Yiyang Zhou
Chenhang Cui
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
VLM
MLLM
235
161
0
18 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Jiaming Song
Yu Qiao
Shiyang Feng
MLLM
437
135
0
08 Feb 2024
MULTI: Multimodal Understanding Leaderboard with Text and Images
Zichen Zhu
Yang Xu
Lu Chen
Jingkai Yang
Yichuan Ma
...
Yingzi Ma
Situo Zhang
Zihan Zhao
Liangtai Sun
Kai Yu
VLM
314
6
0
05 Feb 2024
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Chenhui Zhang
Sherrie Wang
247
35
0
31 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Yuan Liu
VLM
MLLM
349
337
0
29 Jan 2024
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Xue Sun
Xinya Wu
Pengfei Zhou
Richeng Xuan
Guang Liu
Xi Yang
Qiannan Zhu
Hua Huang
ELM
LRM
234
28
0
25 Jan 2024
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuhao Wang
Yusheng Liao
Heyang Liu
Hongcheng Liu
Yu Wang
Yanfeng Wang
LRM
VLM
238
18
0
15 Jan 2024
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
Daoan Zhang
Junming Yang
Hanjia Lyu
Zijian Jin
Xingtai Lv
Mingkai Chen
Jiebo Luo
270
58
0
05 Jan 2024
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2024
Hongzhan Lin
Ziyang Luo
Bo Wang
Ruichao Yang
Jing Ma
450
44
0
03 Jan 2024
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
166
30
0
27 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
321
191
0
11 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
366
10
0
11 Dec 2023
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Shitian Zhao
Zhuowan Li
Yadong Lu
Yaoyao Liu
Yan Wang
LRM
159
14
0
09 Dec 2023
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Junyu Lu
Ruyi Gan
Di Zhang
Xiaojun Wu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
MLLM
VLM
188
21
0
08 Dec 2023
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Computer Vision and Pattern Recognition (CVPR), 2023
Mu Cai
Haotian Liu
Dennis Park
Siva Karthik Mustikovela
Gregory P. Meyer
Yuning Chai
Yong Jae Lee
VLM
LRM
MLLM
297
143
0
01 Dec 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Computer Vision and Pattern Recognition (CVPR), 2023
Qidong Huang
Xiao-wen Dong
Pan Zhang
Bin Wang
Conghui He
Yuan Liu
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
423
349
0
29 Nov 2023
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu
Chenhang Cui
Zijun Wang
Yiyang Zhou
Bingchen Zhao
Junlin Han
Wangchunshu Zhou
Huaxiu Yao
Cihang Xie
MLLM
324
101
0
27 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Computer Vision and Pattern Recognition (CVPR), 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
833
1,564
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoV
LRM
184
45
0
27 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
European Conference on Computer Vision (ECCV), 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Yuan Liu
Feng Zhao
Dahua Lin
MLLM
VLM
343
919
0
21 Nov 2023
KNVQA: A Benchmark for evaluation knowledge-based VQA
Sirui Cheng
Siyu Zhang
Jiayi Wu
Muchen Lan
184
1
0
21 Nov 2023
Previous
1
2
3
...
12
13
14
Next