Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.06355
Cited By
v1
v2 (latest)
VideoChat: Chat-Centric Video Understanding
10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3246★)
Papers citing
"VideoChat: Chat-Centric Video Understanding"
50 / 563 papers shown
User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT
Anna Bodonhelyi
Efe Bozkir
Shuo Yang
Enkelejda Kasneci
Gjergji Kasneci
ELM
AI4MH
192
28
0
03 Feb 2024
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
Pengyuan Zhou
Lin Wang
Zhi Liu
Yanbin Hao
Pan Hui
Sasu Tarkoma
J. Kangasharju
VGen
255
50
0
30 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
432
9
0
18 Jan 2024
On the Audio Hallucinations in Large Audio-Video Language Models
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
233
12
0
18 Jan 2024
Vlogger: Make Your Dream A Vlog
Computer Vision and Pattern Recognition (CVPR), 2024
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGen
DiffM
156
64
0
17 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
International Conference on Machine Learning (ICML), 2024
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
515
64
0
16 Jan 2024
Towards A Better Metric for Text-to-Video Generation
Jay Zhangjie Wu
Guian Fang
Haoning Wu
Xintao Wang
Yixiao Ge
...
Rui Zhao
Weisi Lin
Wynne Hsu
Ying Shan
Mike Zheng Shou
VGen
256
44
0
15 Jan 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
Computer Vision and Pattern Recognition (CVPR), 2024
Xinyu Wang
Bohan Zhuang
Qi Wu
204
23
0
12 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
283
20
0
11 Jan 2024
Video Anomaly Detection and Explanation via Large Language Models
Hui Lv
Qianru Sun
253
53
0
11 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
189
3
0
09 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
330
13
0
03 Jan 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
Computer Vision and Pattern Recognition (CVPR), 2024
Xinpeng Ding
Jinahua Han
Hang Xu
Xiaodan Liang
Wei Zhang
Xiaomeng Li
314
84
0
02 Jan 2024
Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education
Learning and Individual Differences (LID), 2024
Arne Bewersdorff
Christian Hartmann
Marie Hornberger
Kathrin Seßler
Maria Bannert
Enkelejda Kasneci
Gjergji Kasneci
Xiaoming Zhai
Claudia Nerdel
302
87
0
01 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
760
174
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
283
274
0
28 Dec 2023
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
247
18
0
28 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
198
30
0
27 Dec 2023
Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
Jinpeng Liu
Wen-Dao Dai
Chunyu Wang
Yiji Cheng
Yansong Tang
Xin Tong
VGen
DiffM
284
24
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
649
2,210
0
21 Dec 2023
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
408
10
0
21 Dec 2023
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
398
422
0
20 Dec 2023
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
Chun-Mei Feng
Yang Bai
Yaoyu Zhang
Zhen Li
Salman Khan
Wangmeng Zuo
Xinxing Xu
Rick Siow Mong Goh
Yong-Jin Liu
273
10
0
19 Dec 2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang
Jiangwei Xie
ChuanYang Hu
Haoming Zou
Jianan Fan
Wenwen Tong
Yang Wen
Silei Wu
Hanming Deng
Zhiqi Li
364
217
0
14 Dec 2023
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
Neural Information Processing Systems (NeurIPS), 2023
Haifeng Huang
Zehan Wang
Rongjie Huang
Luping Liu
Xize Cheng
Yang Zhao
Tao Jin
Zhou Zhao
334
12
0
13 Dec 2023
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma
Xiaojie Jin
Heng Wang
Yuchen Xian
Jiashi Feng
Yi Yang
263
20
0
12 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
402
199
0
11 Dec 2023
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
364
2
0
11 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAG
ELM
LRM
371
33
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
254
68
0
11 Dec 2023
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
Hongjie Zhang
Lu Dong
Yi Liu
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
340
32
0
08 Dec 2023
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
Zheng Lian
Guoying Zhao
Haiyang Sun
Kang Chen
Zhuofan Wen
Hao Gu
Yinan Han
Jianhua Tao
268
82
0
07 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Yuan Liu
Dahua Lin
Hengshuang Zhao
MLLM
479
36
0
05 Dec 2023
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
243
19
0
04 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
372
364
0
04 Dec 2023
Towards Learning a Generalist Model for Embodied Navigation
Computer Vision and Pattern Recognition (CVPR), 2023
Duo Zheng
Shijia Huang
Lin Zhao
Yiwu Zhong
Liwei Wang
LM&Ro
643
118
0
04 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
188
37
0
01 Dec 2023
Dolphins: Multimodal Language Model for Driving
European Conference on Computer Vision (ECCV), 2023
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
330
125
0
01 Dec 2023
ChatPose: Chatting about 3D Human Pose
Computer Vision and Pattern Recognition (CVPR), 2023
Yao Feng
Jing Lin
Sai Kumar Dwivedi
Yu Sun
Priyanka Patel
Michael J. Black
3DH
298
68
0
30 Nov 2023
VTimeLLM: Empower LLM to Grasp Video Moments
Computer Vision and Pattern Recognition (CVPR), 2023
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
328
244
0
30 Nov 2023
VBench: Comprehensive Benchmark Suite for Video Generative Models
Computer Vision and Pattern Recognition (CVPR), 2023
Ziqi Huang
Yinan He
Jiashuo Yu
Fan Zhang
Chenyang Si
...
Xinyuan Chen
Limin Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
523
1,001
0
29 Nov 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
258
49
0
29 Nov 2023
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
European Conference on Computer Vision (ECCV), 2023
Shicheng Li
Lei Li
Shuhuai Ren
Yuanxin Liu
Yi Liu
Rundong Gao
Xu Sun
Lu Hou
227
49
0
29 Nov 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
European Conference on Computer Vision (ECCV), 2023
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
333
480
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Computer Vision and Pattern Recognition (CVPR), 2023
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
673
872
0
28 Nov 2023
SEED-Bench-2: Benchmarking Multimodal Large Language Models
Bohao Li
Yuying Ge
Yixiao Ge
Guangzhi Wang
Rui Wang
Ruimao Zhang
Ying Shan
MLLM
VLM
188
85
0
28 Nov 2023
AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
Computer Vision and Pattern Recognition (CVPR), 2023
Zixiang Zhou
Yu Wan
Baoyuan Wang
191
52
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
216
92
0
27 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Computer Vision and Pattern Recognition (CVPR), 2023
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
208
32
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoV
LRM
265
48
0
27 Nov 2023
Previous
1
2
3
...
10
11
12
9
Next
Page 10 of 12
Page
of 12
Go