Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.02490
Cited By
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
4 August 2023
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities"
50 / 132 papers shown
Title
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
SITE: towards Spatial Intelligence Thorough Evaluation
W. Wang
Reuben Tan
Pengyue Zhu
Jianwei Yang
Zhengyuan Yang
Lijuan Wang
Andrey Kolobov
Jianfeng Gao
Boqing Gong
41
0
0
08 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
74
1
0
05 May 2025
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen
Runmin Cong
Yuzhi Zhao
Hongzheng Yang
Guangneng Hu
H. Ip
Sam Kwong
CLL
KELM
61
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
52
0
0
04 May 2025
Rethinking Visual Layer Selection in Multimodal LLMs
H. Chen
Junyan Lin
Xinhao Chen
Yue Fan
Xin Jin
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
93
0
0
30 Apr 2025
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Chenkai Zhang
Yiming Lei
Z. Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
AI4TS
46
0
0
30 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
X. Li
MLLM
71
0
0
29 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Z. Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
89
0
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
54
0
0
27 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou
Z. Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Mohit Bansal
Huaxiu Yao
54
0
0
27 Apr 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
J. Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Y. Li
J. Zhu
MLLM
76
0
0
25 Apr 2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning
W. L. Xiao
Leilei Gan
Weilong Dai
Wanggui He
Ziwei Huang
...
Fangxun Shu
Zhelun Yu
Peng Zhang
Hao Jiang
Fei Wu
ReLM
LRM
AI4CE
77
0
0
25 Apr 2025
Token Sequence Compression for Efficient Multimodal Computing
Yasmine Omri
Parth Shroff
Thierry Tambe
51
0
0
24 Apr 2025
V
2
^2
2
R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi Ren Fung
43
0
0
23 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Z. Wang
Senthil Purushwalkam
Caiming Xiong
S.
Heng Ji
R. Xu
38
0
0
23 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
66
7
1
14 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Y. Cao
D. Lin
Jiaqi Wang
OffRL
56
1
0
10 Apr 2025
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Z. Chen
Zongyu Lin
MLLM
VLM
MoE
109
1
0
10 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLM
OffRL
LRM
35
0
0
10 Apr 2025
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian
Shenzhe Zhu
Yuehan Qin
Li Li
Z. Wang
Chaowei Xiao
Yue Zhao
21
0
0
03 Apr 2025
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang
Yushen Zuo
Yuanjun Chai
Z. Liu
Yichen Fu
Yichun Feng
Kin-Man Lam
AAML
VLM
40
0
0
02 Apr 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
87
0
0
29 Mar 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Y. Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
75
3
0
28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
90
0
0
26 Mar 2025
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
Mehdi Moshtaghi
Siavash H. Khajavi
Joni Pajarinen
VLM
41
0
0
25 Mar 2025
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song
Y. Wang
Zijia Song
Yadong Li
Haoze Sun
Weipeng Chen
Zenan Zhou
Jianhua Xu
Jiaqi Wang
Kaicheng Yu
60
2
0
18 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
J. Li
Xiang Yue
Bo Li
Ping Nie
Kai Zou
Wenhu Chen
LRM
74
2
0
13 Mar 2025
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
Xudong Tan
Peng Ye
Chongjun Tu
Jianjian Cao
Yaoxin Yang
Lin Zhang
Dongzhan Zhou
Tao Chen
VLM
51
0
0
13 Mar 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
42
1
0
13 Mar 2025
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang
Quan Cui
Bingchen Zhao
Cheng Yang
MLLM
SyDa
49
0
0
11 Mar 2025
Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
H. Shen
63
1
0
11 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
83
0
0
10 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
50
1
0
04 Mar 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
63
0
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
77
3
0
26 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
100
6
0
25 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
74
8
0
21 Feb 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
Shengguang Wu
Fan-Yun Sun
Kaiyue Wen
Nick Haber
VLM
75
0
0
20 Feb 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Junkai Chen
Zhijie Deng
Kening Zheng
Yibo Yan
Shuliang Liu
PeiJun Wu
Peijie Jiang
J. Liu
Xuming Hu
MU
55
3
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
J. Zhao
M. Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
56
1
0
18 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
102
1
0
18 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li
Yuquan Deng
J. Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
86
5
0
08 Feb 2025
ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models
Jingwei Yi
Junhao Yin
Ju Xu
Peng Bao
Y. Wang
Wei Fan
H. Wang
42
0
0
20 Jan 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Z. Chen
Mingxiao Li
Z. Chen
Nan Du
Xiaolong Li
Yuexian Zou
53
0
0
19 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
D. Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
107
0
10 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
59
25
0
07 Jan 2025
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang
Yuchang Su
Yiming Liu
Xiaohan Wang
James Burgess
...
Josiah Aklilu
Alejandro Lozano
Anjiang Wei
Ludwig Schmidt
Serena Yeung-Levy
50
3
0
06 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
45
0
03 Jan 2025
1
2
3
Next