Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.01319
Cited By
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
2 August 2024
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
Yi Pan
Mengyuan Liu
Peiran Gu
Sichen Xia
Wenjun Li
Yutong Zhang
Zihao Wu
Zheng Liu
Tianyang Zhong
Bao Ge
Tuo Zhang
Ning Qiang
Xintao Hu
Xi Jiang
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks"
8 / 8 papers shown
Title
Position: Foundation Models Need Digital Twin Representations
Yiqing Shen
Hao Ding
Lalithkumar Seenivasan
Tianmin Shu
Mathias Unberath
AI4CE
31
0
0
01 May 2025
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
Y. Li
J. Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVM
VGen
65
0
0
07 Apr 2025
Do Language Models Understand Time?
Xi Ding
Lei Wang
158
0
0
18 Dec 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
107
221
0
18 Sep 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
845
0
17 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
193
375
0
01 Feb 2021
1