Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.06281
Cited By
v1
v2
v3
v4 (latest)
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (5 upvotes)
Papers citing
"MMBench: Is Your Multi-modal Model an All-around Player?"
50 / 687 papers shown
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
Shiryu Ueno
Yoshikazu Hayashi
Shunsuke Nakatsuka
Yusei Yamada
Hiroaki Aizawa
K. Kato
MLLM
VLM
400
1
0
13 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
724
33
0
12 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
International Conference on Learning Representations (ICLR), 2025
Yi Li
Yuquan Deng
Jing Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
756
68
0
08 Feb 2025
PixelWorld: How Far Are We from Perceiving Everything as Pixels?
Zhiheng Lyu
Xueguang Ma
Wenhu Chen
675
3
0
31 Jan 2025
Benchmarking Gaslighting Negation Attacks Against Multimodal Large Language Models
Bin Zhu
Hui yan Qi
Yinxuan Gui
Yue Yu
Chong-Wah Ngo
Ee-Peng Lim
1.2K
5
0
31 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
330
65
0
28 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Feiyu Xiong
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
600
46
0
21 Jan 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ziyang Chen
Mingxiao Li
Zhongfu Chen
Nan Du
Xiaolong Li
Yuexian Zou
372
4
0
19 Jan 2025
Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
Amirreza Payandeh
Daeun Song
Mohammad Nazeri
Jing Liang
Praneel Mukherjee
Amir Hossain Raj
Yangzhe Kong
Dinesh Manocha
Xuesu Xiao
LM&Ro
LRM
460
18
0
17 Jan 2025
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models
Mozhgan Nasr Azadani
James Riddell
Sean Sedwards
Krzysztof Czarnecki
MLLM
VLM
250
7
0
13 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Computer Vision and Pattern Recognition (CVPR), 2023
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Yuan Liu
Kaipeng Zhang
Dahua Lin
Yu Qiao
Shiyang Feng
Xiangyu Yue
MLLM
577
198
0
10 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
International Conference on Learning Representations (ICLR), 2025
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
456
106
0
07 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
...
Yunhai Tong
Lu Qi
Jiashi Feng
Ming-Hsuan Yang
Ming-Hsuan Yang
VLM
612
68
0
07 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
981
17
0
05 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Neural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
868
121
0
03 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Yuan Liu
Hengshuang Zhao
769
52
0
02 Jan 2025
Diving into Self-Evolving Training for Multimodal Reasoning
Wei Liu
Junlong Li
Xiwen Zhang
Fan Zhou
Yu Cheng
Junxian He
LRM
ReLM
435
28
0
23 Dec 2024
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yeyuan Wang
D. Gao
Bin Li
Rujiao Long
Lei Yi
Xiaoyan Cai
Libin Yang
Jinxia Zhang
Jinsong Chen
Qi Xuan
241
1
0
22 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Computer Vision and Pattern Recognition (CVPR), 2024
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
514
5
0
20 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Computer Vision and Pattern Recognition (CVPR), 2024
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
528
349
0
18 Dec 2024
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Seunghee Kim
Changhyeon Kim
Taeuk Kim
LRM
464
7
0
17 Dec 2024
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Computer Vision and Pattern Recognition (CVPR), 2024
Hao Li
Changyao Tian
Jie Shao
X. Zhu
Zhaokai Wang
...
Wenhan Dou
Xiaogang Wang
Jiaming Song
Lewei Lu
Jifeng Dai
MLLM
355
35
0
12 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Computer Vision and Pattern Recognition (CVPR), 2024
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Juil Sock
VLM
ObjD
1.2K
3
0
12 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Mingyu Ding
Xihui Liu
LLMAG
LRM
419
20
0
05 Dec 2024
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Computer Vision and Pattern Recognition (CVPR), 2024
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Hao Sun
Yibing Song
Kaidi Wang
Zinan Lin
Yang You
481
23
0
04 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
673
2
0
04 Dec 2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong
Zhuoming Liu
Yin Li
Liwei Wang
449
23
0
04 Dec 2024
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
International Conference on Learning Representations (ICLR), 2024
Zhongfu Chen
Tingzhu Chen
Wenjun Zhang
Guangtao Zhai
401
15
0
02 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
396
6
0
02 Dec 2024
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Computer Vision and Pattern Recognition (CVPR), 2024
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
...
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
607
30
0
27 Nov 2024
Evaluating Vision-Language Models as Evaluators in Path Planning
Computer Vision and Pattern Recognition (CVPR), 2024
Mohamed Aghzal
Xiang Yue
Erion Plaku
Ziyu Yao
LRM
671
4
0
27 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
565
22
0
27 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Computer Vision and Pattern Recognition (CVPR), 2024
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
888
84
0
25 Nov 2024
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Computer Vision and Pattern Recognition (CVPR), 2024
Ashmal Vayani
Dinura Dissanayake
Hasindri Watawana
Noor Ahsan
Nevasini Sasikumar
...
Monojit Choudhury
Ivan Laptev
Mubarak Shah
Salman Khan
Fahad A Khan
813
42
0
25 Nov 2024
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Computer Vision and Pattern Recognition (CVPR), 2024
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
392
5
0
23 Nov 2024
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
Yuke Zhu
Chi Xie
Shuang Liang
Bo Zheng
Sheng Guo
317
17
0
21 Nov 2024
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao
Bin Zhu
Yue Yu
Chong-Wah Ngo
Yu-Gang Jiang
VLM
OffRL
444
0
0
19 Nov 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Ruichuan An
Sihan Yang
Ming Lu
Kai Zeng
Yulin Luo
...
Hao Liang
Qi She
Shanghang Zhang
Feiyu Xiong
Wentao Zhang
661
39
0
18 Nov 2024
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Computer Vision and Pattern Recognition (CVPR), 2024
Yunlong Tang
Junjia Guo
Hang Hua
Susan Liang
Mingqian Feng
...
Chao Huang
Jing Bi
Zeliang Zhang
Pooyan Fazli
Chenliang Xu
CoGe
426
16
0
17 Nov 2024
Multimodal Instruction Tuning with Hybrid State Space Models
Jianing Zhou
Han Li
Shuai Zhang
Ning Xie
Ruijie Wang
Xiaohan Nie
Sheng Liu
Lingyun Wang
267
0
0
13 Nov 2024
Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM
D. Song
Sicheng Lai
Shunian Chen
Shunian Chen
Lichao Sun
Benyou Wang
1.1K
2
0
06 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Neural Information Processing Systems (NeurIPS), 2024
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
421
8
0
05 Nov 2024
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Yunkai Dang
Mengxi Gao
Yibo Yan
Xin Zou
Yanggan Gu
...
Jingyu Wang
Peijie Jiang
Aiwei Liu
Jia Liu
Xuming Hu
359
11
0
05 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
613
4
0
01 Nov 2024
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Kimihiro Hasegawa
Wiradee Imrattanatrai
Zhi-Qi Cheng
Masaki Asada
Susan Holm
Yuran Wang
Ken Fukuda
Teruko Mitamura
260
7
0
29 Oct 2024
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
Han Bao
Yue Huang
Zixiang Xu
Jiayi Ye
Xiangqi Wang
Preslav Nakov
Mohamed Elhoseiny
Wei Wei
Mohamed Elhoseiny
Xiangliang Zhang
337
16
0
28 Oct 2024
EfficientEQA: An Efficient Approach to Open-Vocabulary Embodied Question Answering
Kai Cheng
Zhengyuan Li
Xingpeng Sun
Byung-Cheol Min
Amrit Singh Bedi
Aniket Bera
188
9
0
26 Oct 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
International Conference on Learning Representations (ICLR), 2024
Leander Girrbach
Yiran Huang
Stephan Alaniz
Trevor Darrell
Zeynep Akata
VLM
435
8
0
25 Oct 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
Devendra Singh Dhami
Constantin Rothkopf
Kristian Kersting
CoGe
ReLM
VLM
LRM
589
11
0
25 Oct 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDa
VLM
MLLM
448
54
0
24 Oct 2024
Previous
1
2
3
...
10
11
12
13
14
9
Next
Page 10 of 14
Page
of 14
Go