Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.06355
Cited By
VideoChat: Chat-Centric Video Understanding
10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoChat: Chat-Centric Video Understanding"
50 / 423 papers shown
Title
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
23
58
0
27 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Janet Liu
EgoV
LRM
21
15
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
27
3
0
25 Nov 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen
Yuan Lin
Yuchen Zhang
Weiran Huang
ELM
MLLM
18
26
0
25 Nov 2023
ADriver-I: A General World Model for Autonomous Driving
Fan Jia
Weixin Mao
Yingfei Liu
Yucheng Zhao
Yuqing Wen
Chi Zhang
Xiangyu Zhang
Tiancai Wang
22
63
0
22 Nov 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
17
34
0
22 Nov 2023
VLM-Eval: A General Evaluation on Video Large Language Models
Shuailin Li
Yuang Zhang
Yucheng Zhao
Qiuyue Wang
Fan Jia
Yingfei Liu
Tiancai Wang
MLLM
ELM
10
2
0
20 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
194
576
0
16 Nov 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal
Yonatan Bitton
Idan Szpektor
Kai-Wei Chang
Aditya Grover
28
14
0
15 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
11
9
0
15 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
28
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
32
217
0
14 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
66
3
0
10 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
367
0
07 Nov 2023
VQPy: An Object-Oriented Approach to Modern Video Analytics
Shan Yu
Zhenting Zhu
Yu Chen
Hanchen Xu
Pengzhan Zhao
Yang Wang
Arthi Padmanabhan
Hugo Latapie
Harry Xu
26
3
0
03 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
21
62
0
30 Oct 2023
MarineGPT: Unlocking Secrets of Ocean to the Public
Ziqiang Zheng
Jipeng Zhang
Tuan-Anh Vu
Shizhe Diao
Yue Him Wong Tim
Sai-Kit Yeung
28
11
0
20 Oct 2023
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Mingwei Zhu
Leigang Sha
Yu Shu
Kangjia Zhao
Tiancheng Zhao
Jianwei Yin
LRM
22
0
0
20 Oct 2023
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation
Shengqiang Zhang
Philipp Wicke
Lutfi Kerem Senel
Luis F. C. Figueredo
Abdeldjallil Naceri
Sami Haddadin
Barbara Plank
Hinrich Schütze
LM&Ro
21
10
0
18 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
31
116
0
16 Oct 2023
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction
Xiang Hao
Jibin Wu
Jianwei Yu
Chenglin Xu
Kay Chen Tan
19
10
0
11 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
26
12
0
09 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
21
12
0
08 Oct 2023
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
Zhenhua Xu
Yujia Zhang
Enze Xie
Zhen Zhao
Yong Guo
Kwan-Yee. K. Wong
Zhenguo Li
Hengshuang Zhao
MLLM
18
250
0
02 Oct 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
22
29
0
27 Sep 2023
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
Yaohui Wang
Xinyuan Chen
Xin Ma
Shangchen Zhou
Ziqi Huang
...
Chen Change Loy
Bo Dai
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
DiffM
26
222
0
26 Sep 2023
A Survey of Hallucination in Large Foundation Models
Vipula Rawte
A. Sheth
Amitava Das
HILM
LRM
23
339
0
12 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
448
0
11 Sep 2023
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
Ashmit Khandelwal
Aditya Agrawal
Aanisha Bhattacharyya
Yaman Kumar Singla
Somesh Singh
...
Ishita Dasgupta
Stefano Petrangeli
R. Shah
Changyou Chen
Balaji Krishnamurthy
11
8
0
01 Sep 2023
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings
Yulin Su
Min Yang
Minghui Qiu
Jing Wang
Tao Wang
VLM
25
0
0
17 Aug 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
19
58
0
17 Aug 2023
Fine-Tune Language Models as Multi-Modal Differential Equation Solvers
Liu Yang
Siting Liu
Stanley J. Osher
14
0
0
09 Aug 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
12
0
0
08 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
33
83
0
03 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Jenq-Neng Hwang
Gaoang Wang
VLM
MLLM
17
259
0
31 Jul 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLM
ELM
16
493
0
30 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
22
241
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
499
0
12 Jul 2023
Emu: Generative Pretraining in Multimodality
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
27
126
0
11 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
80
223
0
07 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
31
100
0
03 Jul 2023
Look, Remember and Reason: Grounded reasoning in videos with language models
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
30
7
0
30 Jun 2023
Explainable Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Licai Sun
Hao Gu
Zhuofan Wen
...
Shan Liang
Ya Li
Jiangyan Yi
B. Liu
Jianhua Tao
MLLM
8
6
0
27 Jun 2023
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Bo-wen Li
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
26
20
0
26 Jun 2023
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Chunyuan Li
MLLM
VLM
14
20
0
26 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
14
688
0
26 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
33
551
0
23 Jun 2023
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
Yaqi Zhang
Di Huang
B. Liu
Shixiang Tang
Yan Lu
Lu Chen
Lei Bai
Qi Chu
Nenghai Yu
Wanli Ouyang
72
96
0
19 Jun 2023
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost
Juexiao Zhou
Xiuying Chen
Xin Gao
LM&MA
AI4CE
85
12
0
19 Jun 2023
Previous
1
2
3
4
5
6
7
8
9
Next