Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.06355
Cited By
v1
v2 (latest)
VideoChat: Chat-Centric Video Understanding
10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3246★)
Papers citing
"VideoChat: Chat-Centric Video Understanding"
50 / 563 papers shown
Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models
IEEE Transactions on Vehicular Technology (IEEE Trans. Veh. Technol.), 2024
Yizhou Huang
Yihua Cheng
Kezhi Wang
LRM
189
4
0
30 Sep 2024
Visual Context Window Extension: A New Perspective for Long Video Understanding
Hongchen Wei
Zhenzhong Chen
VLM
298
1
0
30 Sep 2024
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Neural Information Processing Systems (NeurIPS), 2024
Zechen Bai
Tong He
Haiyang Mei
Pichao Wang
Ziteng Gao
Joya Chen
Lei Liu
Zheng Zhang
Mike Zheng Shou
VLM
VOS
MLLM
257
76
0
29 Sep 2024
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Xiao Wang
Yue Yu
Zijia Lin
Fuzheng Zhang
Di Zhang
Liqiang Nie
VGen
216
5
0
29 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Neural Information Processing Systems (NeurIPS), 2024
Ye Liu
Zongyang Ma
Chen Ma
Yang Wu
Ying Shan
Chang Wen Chen
273
52
0
26 Sep 2024
LLM4Brain: Training a Large Language Model for Brain Video Understanding
Ruizhe Zheng
Lichao Sun
141
4
0
26 Sep 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
733
128
0
26 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
465
21
0
26 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
493
9
0
23 Sep 2024
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2024
Yan Shu
Peitian Zhang
Zheng Liu
Minghao Qin
Yueze Wang
Tiejun Huang
Bo Zhao
VLM
461
140
0
22 Sep 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang
Bingxin Xu
Weitai Kang
Mu Cai
Yuheng Li
Zehao Wen
Zhen Dong
Kurt Keutzer
Yong Jae Lee
Yan Yan
310
11
0
19 Sep 2024
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanmin Wu
Jiayi Lei
...
Guanglu Song
Peng Gao
Yu Liu
Chunyuan Li
Hongsheng Li
MLLM
287
50
0
19 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
422
5
0
19 Sep 2024
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu
Tong Xu
Zihan Niu
Bo Peng
Jing Ke
Enhong Chen
230
0
0
14 Sep 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
European Conference on Computer Vision (ECCV), 2024
Yang Liu
Pengxiang Ding
Siteng Huang
Min Zhang
Han Zhao
Donglin Wang
221
9
0
11 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
272
12
0
10 Sep 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Neural Information Processing Systems (NeurIPS), 2024
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
249
32
0
29 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
303
198
0
29 Aug 2024
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
European Conference on Computer Vision (ECCV), 2024
Minghang Zheng
Xinhao Cai
Qingchao Chen
Yuxin Peng
Yang Liu
238
19
0
29 Aug 2024
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
Qihang Ge
Wei Sun
Yu Zhang
Yunhao Li
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
198
23
0
26 Aug 2024
Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Shivam Chandhok
Wan-Cyuan Fan
Leonid Sigal
VLM
MLLM
169
7
0
13 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
International Conference on Learning Representations (ICLR), 2024
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
314
230
0
09 Aug 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Chunjiang Ge
Meng Zhao
...
Rongrong Ji
Xing Sun
Ran He
Caifeng Shan
Xing Sun
MLLM
577
147
0
09 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
International Journal of Computer Vision (IJCV), 2024
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
352
24
0
08 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
321
2
0
04 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Shu Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
286
86
0
02 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
455
11
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
248
12
0
31 Jul 2024
Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images
Cities (Cities), 2024
Jiaxin Zhanga
Yunqin Lia
Tomohiro Fukudab
Bowen Wang
260
1
0
29 Jul 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
263
96
0
22 Jul 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
313
41
0
22 Jul 2024
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan
Rui Liu
Wenguan Wang
Yi Yang
263
20
0
21 Jul 2024
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie Yang
Xuesong Niu
Nan Jiang
Ruimao Zhang
Siyuan Huang
234
22
0
17 Jul 2024
Refusing Safe Prompts for Multi-modal Large Language Models
Zedian Shao
Hongbin Liu
Yuepeng Hu
Neil Zhenqiang Gong
MLLM
LRM
221
4
0
12 Jul 2024
MAVIS: Mathematical Visual Instruction Tuning
Renrui Zhang
Xinyu Wei
Dongzhi Jiang
Yichi Zhang
Ziyu Guo
...
Aojun Zhou
Bin Wei
Shanghang Zhang
Peng Gao
Hongsheng Li
MLLM
134
9
0
11 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
350
12
0
11 Jul 2024
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding
Minghui Wu
Chenxu Zhao
Anyang Su
Donglin Di
Tianyu Fu
...
Min He
Ya Gao
Meng Ma
Kun Yan
Ping Wang
323
6
0
11 Jul 2024
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang
Xinpeng Ding
Chunwei Wang
J. N. Han
Yulong Liu
Hengshuang Zhao
Hang Xu
Lu Hou
Wei Zhang
Xiaodan Liang
VLM
198
13
0
11 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLM
VLM
367
439
0
10 Jul 2024
AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Guoying Zhao
Jiangyan Yi
Bin Liu
Jianhua Tao
257
8
0
10 Jul 2024
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar
Xiaohan Wang
Yonatan Bitton
Idan Szpektor
Serena Yeung-Levy
VLM
LRM
237
17
0
08 Jul 2024
Multimodal Language Models for Domain-Specific Procedural Video Summarization
Nafisa Hussain
303
0
0
07 Jul 2024
Enhance the Robustness of Text-Centric Multimodal Alignments
Ting-Yu Yen
Yun-Da Tsai
Keng-Te Liao
Shou-De Lin
267
4
0
06 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
324
24
0
05 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
299
171
0
03 Jul 2024
Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs
Jinmin Li
Kuofeng Gao
Yang Bai
Jingyun Zhang
Shu-Tao Xia
293
6
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
412
23
0
01 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
304
115
0
30 Jun 2024
ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Jr-Jen Chen
Yu-Chien Liao
Hsi-Che Lin
Yu-Chu Yu
Yen-Chun Chen
Yu-Chiang Frank Wang
354
40
0
27 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
183
3
0
26 Jun 2024
Previous
1
2
3
...
6
7
8
...
10
11
12
Next