Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.17981
Cited By
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
31 January 2024
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study"
8 / 8 papers shown
Title
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models
Y. Zhang
Chunwang Zou
Bo Wang
Jing Qin
51
0
0
24 Mar 2025
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
Jiaqi Wang
Yifei Gao
Jitao Sang
MLLM
110
2
0
24 Nov 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
80
139
0
29 Apr 2024
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
Jinrong Yang
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
MLLM
VLM
66
73
0
11 Dec 2023
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen
Leyang Shen
Rui Shao
Xiang Deng
Liqiang Nie
VLM
MLLM
65
38
0
20 Nov 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLM
MLLM
121
83
0
08 Oct 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
110
221
0
18 Sep 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
247
4,186
0
30 Jan 2023
1