Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.11865
Cited By
From Image to Video, what do we need in multimodal LLMs?
18 April 2024
Suyuan Huang
Haoxin Zhang
Yan Gao
Yao Hu
Zengchang Qin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Image to Video, what do we need in multimodal LLMs?"
3 / 3 papers shown
Title
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
69
80
0
30 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
182
576
0
16 Nov 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
1