Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.06281
Cited By
v1
v2
v3
v4 (latest)
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2023
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (5 upvotes)
Papers citing
"MMBench: Is Your Multi-modal Model an All-around Player?"
50 / 687 papers shown
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
308
9
0
14 Mar 2025
Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection
Bangzheng Li
Haiwei Yang
Wenxuan Zhou
Nan Xu
Ben Zhou
Sheng Zhang
Hoifung Poon
Mengzhao Chen
MLLM
VLM
328
2
0
14 Mar 2025
EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks
Yi Zhang
Qiang Zhang
Xiaozhu Ju
Ziqiang Liu
Jilei Mao
...
Jiaxu Wang
Yiqun Duan
Jiahang Cao
Zhanchen Zhu
Jian Tang
LM&Ro
LRM
288
5
0
14 Mar 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
462
4
0
13 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Eric Fosler-Lussier
Chenglong Wang
322
2
0
13 Mar 2025
DAVE: Diagnostic benchmark for Audio Visual Evaluation
Gorjan Radevski
Teodora Popordanoska
Matthew B. Blaschko
Tinne Tuytelaars
271
1
0
12 Mar 2025
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Qiji Zhou
Yifan Gong
Guangsheng Bao
Hongjie Qiu
Jinqiang Li
Xiangrong Zhu
Huajian Zhang
Yue Zhang
LRM
268
3
0
12 Mar 2025
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang
Quan Cui
Bingchen Zhao
Cheng Yang
MLLM
SyDa
470
6
0
11 Mar 2025
Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Mengqi Li
544
3
0
11 Mar 2025
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru
Dunant Cusipuma
David Ortega
Victor Flores-Benites
Arturo Deza
OOD
306
0
0
10 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
442
23
0
10 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2025
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
1.1K
7
0
10 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Sihao Lin
Chunwei Wang
Xiuwei Chen
Hongbin Xu
Jiawei Han
Xiandan Liang
J. N. Han
Hang Xu
Xiaodan Liang
VLM
719
14
0
09 Mar 2025
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
Yanling Wang
Yihan Zhao
Xiaodong Chen
Shasha Guo
Lixin Liu
Haoyang Li
Yong Xiao
Jing Zhang
Qi Li
Ke Xu
219
3
0
09 Mar 2025
Statistical Study of Sensor Data and Investigation of ML-based Calibration Algorithms for Inexpensive Sensor Modules: Experiments from Cape Point
IEEE Transactions on Instrumentation and Measurement (IEEE Trans. Instrum. Meas.), 2025
Travis Barrett
Amit Kumar Mishra
329
2
0
09 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
Jianchao Tan
ELM
952
3
0
09 Mar 2025
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai
Songyou Peng
Kyle Genova
Leonidas Guibas
Thomas Funkhouser
3DGS
598
11
0
08 Mar 2025
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Shwai He
Weilin Cai
Jiayi Huang
Ang Li
MoE
499
3
0
07 Mar 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Siyang Song
Mohammed Irfan Kurpath
Sahal Shaji Mullappilly
Jean Lahoud
Fahad A Khan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
AuLLM
660
6
0
06 Mar 2025
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task
Vittorio Pippi
Matthieu Guillaumin
S. Cascianelli
Rita Cucchiara
M. Jaritz
Loris Bazzani
199
0
0
06 Mar 2025
See What You Are Told: Visual Attention Sink in Large Multimodal Models
International Conference on Learning Representations (ICLR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
380
52
0
05 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Computer Vision and Pattern Recognition (CVPR), 2025
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
346
32
0
04 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2025
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
567
48
0
04 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
361
5
0
04 Mar 2025
Towards Enhanced Image Generation Via Multi-modal Chain of Thought in Unified Generative Models
Yi Wang
Mushui Liu
Wanggui He
Longxiang Zhang
Longxiang Zhang
...
Weilong Dai
Weilong Dai
Mingli Song
Hao Jiang
Jie Song
MLLM
MoE
LRM
494
13
0
03 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Nianzu Yang
Yun Zheng
Liwei Wang
ObjD
VLM
459
14
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Yanzhe Zhang
Xiren Zhou
MoE
SyDa
302
294
0
03 Mar 2025
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
Benjamin Schneider
Florian Kerschbaum
Wenhu Chen
975
0
0
01 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Computer Vision and Pattern Recognition (CVPR), 2025
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
496
88
0
28 Feb 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Wanrong Zhu
MoE
493
3
0
27 Feb 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELM
VLM
LRM
412
2
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
632
12
0
26 Feb 2025
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
Zhuo Chen
Xinyu Wang
Yong Jiang
Zhen Zhang
Xin Guan
Pengjun Xie
Fei Huang
Kewei Tu
443
5
0
25 Feb 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
LRM
334
12
0
24 Feb 2025
MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing
Matvey Skripkin
Elizaveta Goncharova
Dmitrii Tarasov
Andrey Kuznetsov
282
0
0
24 Feb 2025
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
Rohit Saxena
Pasquale Minervini
Frank Keller
VLM
249
8
0
24 Feb 2025
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yi-Kai Zhang
De-Chuan Zhan
Han-Jia Ye
ALM
ELM
LRM
469
18
0
24 Feb 2025
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
621
5
0
24 Feb 2025
Testing the Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models
Luca M. Schulze Buschoff
Konstantinos Voudouris
Elif Akata
Matthias Bethge
Joshua B. Tenenbaum
Eric Schulz
VLM
LRM
450
0
1
21 Feb 2025
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Xiuwei Chen
Sihao Lin
Xiao Dong
Sihao Lin
Meng Cao
Jiawei Han
Yina Zhuang
J. N. Han
Hang Xu
Xiaodan Liang
Mamba
370
4
0
21 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Neural Information Processing Systems (NeurIPS), 2024
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
417
14
0
21 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
534
0
0
20 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Hui Yuan
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Longji Xu
235
1
0
19 Feb 2025
InsightVision: A Comprehensive, Multi-Level Chinese-based Benchmark for Evaluating Implicit Visual Semantics in Large Vision Language Models
Xiaofei Yin
Y. Hong
Ya Guo
Yi Tu
Weiqiang Wang
Gongshen Liu
Huijia Zhu
VLM
183
1
0
19 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
641
6
0
18 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
286
22
0
17 Feb 2025
Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Junda Wu
Yuxin Xiong
Xintong Li
Yu Xia
Ruoyu Wang
...
Sungchul Kim
Ryan Rossi
Lina Yao
Jingbo Shang
Julian McAuley
CLL
VLM
346
2
0
17 Feb 2025
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection
Jinhe Bi
Yifan Wang
Danqi Yan
Aniri
Wenke Huang
...
Xun Xiao
Hinrich Schuetze
Volker Tresp
Yunpu Ma
Yunpu Ma
VLM
520
32
0
17 Feb 2025
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
Shintaro Ozaki
Kazuki Hayashi
Yusuke Sakai
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
LRM
388
3
0
17 Feb 2025
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
International Conference on Learning Representations (ICLR), 2025
Zhenyu Yang
Yihan Hu
Zemin Du
Dizhan Xue
Chuanrui Hu
Jiahong Wu
Fan Yang
Weiming Dong
Changsheng Xu
334
27
0
15 Feb 2025
Previous
1
2
3
...
8
9
10
...
12
13
14
Next
Page 9 of 14
Page
of 14
Go