Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.06355
Cited By
v1
v2 (latest)
VideoChat: Chat-Centric Video Understanding
10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3246★)
Papers citing
"VideoChat: Chat-Centric Video Understanding"
50 / 563 papers shown
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS
Qingyang Li
Yihang Zhang
Zhidong Jia
Yannan Hu
Lei Zhang
Jianrong Zhang
Yongming Xu
Yong Cui
Xinggong Zhang
Xinggong Zhang
AI4CE
238
17
0
13 May 2024
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Jameel Hassan
Muzammal Naseer
Federico Tombari
Fahad Shahbaz Khan
Salman Khan
LRM
ELM
291
28
0
06 May 2024
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Yuanhan Zhang
Kaichen Zhang
Yue Liu
Fanyi Pu
Christopher Arif Setiadharma
Jingkang Yang
Ziwei Liu
VGen
229
16
0
06 May 2024
Octopi: Object Property Reasoning with Large Tactile-Language Models
Samson Yu
Kelvin Lin
Anxing Xiao
Jiafei Duan
Harold Soh
LRM
406
47
0
05 May 2024
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du
Sicheng Zhang
Binzhu Xie
Gu Nan
Jiayang Zhang
...
Xuhuan Li
Hao Zhang
Jianhang Chen
Qimei Cui
Xiao Tao
CML
222
44
0
30 Apr 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Lei Li
Xi Li
Gaoang Wang
VLM
MLLM
256
53
0
26 Apr 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Guoying Zhao
Zhuofan Wen
Siyuan Zhang
...
Yinan Han
Xiaoshi Zhong
Guoying Zhao
Björn W. Schuller
Jianhua Tao
VLM
394
37
0
26 Apr 2024
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
MLLM
VLM
274
283
0
25 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
534
1,004
0
25 Apr 2024
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Bohao Li
Yuying Ge
Yi Chen
Yixiao Ge
Ruimao Zhang
Ying Shan
VLM
211
106
0
25 Apr 2024
Pegasus-v1 Technical Report
Raehyuk Jung
Hyojun Go
Jaehyuk Yi
Jiho Jang
Daniel Kim
...
Maninder Saini
Meredith Sanders
Soyoung Lee
Sue Kim
Travis Couture
MLLM
VLM
105
5
0
23 Apr 2024
Graphic Design with Large Multimodal Model
Yutao Cheng
Zhao Zhang
Maoke Yang
Hui Nie
Chunyuan Li
Xinglong Wu
Jie Shao
330
27
0
22 Apr 2024
From Image to Video, what do we need in multimodal LLMs?
Suyuan Huang
Haoxin Zhang
Yan Gao
Honggu Chen
Yan Gao
Yao Hu
Zhan Qin
VLM
303
12
0
18 Apr 2024
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal
Michael Wray
Dima Damen
219
10
0
15 Apr 2024
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe
Satya Narayan Shukla
Omid Poursaeed
Michael S. Ryoo
Tsung-Yu Lin
LRM
211
61
0
11 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
362
184
0
08 Apr 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard
Zhixi Cai
Shiki Wen
Hamid Rezatofighi
184
13
0
06 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
371
62
0
05 Apr 2024
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
European Conference on Computer Vision (ECCV), 2024
Kailin Li
Jingbo Wang
Lixin Yang
Cewu Lu
Bo Dai
259
34
0
04 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
European Conference on Computer Vision (ECCV), 2024
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
372
128
0
04 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
383
124
0
01 Apr 2024
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu
Chen Li
Haoran Tang
Yixiao Ge
Ying Shan
Ge Li
210
125
0
30 Mar 2024
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
VLM
241
104
0
27 Mar 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Wonkyun Kim
Changin Choi
Wonseok Lee
Wonjong Rhee
VLM
240
81
0
27 Mar 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao
Shengju Qian
Han Xiao
Guanglu Song
Zhuofan Zong
Letian Wang
Yu Liu
Jiaming Song
VGen
LRM
MLLM
370
217
0
25 Mar 2024
Elysium: Exploring Object-level Perception in Videos via MLLM
Hang Wang
Yanjie Wang
Yongjie Ye
Yuxiang Nie
Can Huang
MLLM
327
39
0
25 Mar 2024
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
Hang Hua
Chenliang Xu
VGen
387
17
0
24 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
European Conference on Computer Vision (ECCV), 2024
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
273
104
0
22 Mar 2024
FMM-Attack: A Flow-based Multi-modal Adversarial Attack on Video-based LLMs
Jinmin Li
Kuofeng Gao
Yang Bai
Jingyun Zhang
Shu-Tao Xia
Yisen Wang
AAML
245
12
0
20 Mar 2024
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lu
Baining Guo
VLM
155
6
0
19 Mar 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Computer Vision and Pattern Recognition (CVPR), 2024
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
483
7
0
19 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
230
62
0
15 Mar 2024
GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
Hao Lu
Xuesong Niu
Jiyao Wang
Yin Wang
Qingyong Hu
...
Dengbo He
Shuiguang Deng
Hao Chen
Ying-Cong Chen
Shiguang Shan
MLLM
277
28
0
09 Mar 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
European Conference on Computer Vision (ECCV), 2024
Qilang Ye
Zitong Yu
Rui Shao
Xinyu Xie
Juil Sock
Simeng Qin
MLLM
413
49
0
07 Mar 2024
Embodied Understanding of Driving Scenarios
European Conference on Computer Vision (ECCV), 2024
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
255
53
0
07 Mar 2024
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
Yunzhuo Sun
Yifang Xu
Zien Xie
Yukun Shu
Sidan Du
331
10
0
03 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
371
343
0
29 Feb 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang
Yiming Ren
Hao Luo
Tiantong Li
Chenxiang Yan
...
Qingyun Li
Lewei Lu
Xizhou Zhu
Yu Qiao
Jifeng Dai
MLLM
319
86
0
29 Feb 2024
Navigating Hallucinations for Reasoning of Unintentional Activities
Shresth Grover
Vibhav Vineet
Yogesh S Rawat
LRM
318
3
0
29 Feb 2024
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
249
9
0
29 Feb 2024
OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen
Jing Bi
Ali Vosoughi
Yapeng Tian
Pooyan Fazli
Chenliang Xu
552
14
0
27 Feb 2024
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models
Dingkun Guo
Yuqi Xiang
Shuqi Zhao
Xinghao Zhu
Masayoshi Tomizuka
Mingyu Ding
Wei Zhan
249
14
0
26 Feb 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu
Junting Chen
Qinglong Zhang
Shoufa Chen
Qiaojun Yu
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Mingyu Ding
Ping Luo
253
46
0
25 Feb 2024
Slot-VLM: SlowFast Slots for Video-Language Modeling
Jiaqi Xu
Cuiling Lan
Wenxuan Xie
Xuejin Chen
Yan Lu
MLLM
VLM
151
10
0
20 Feb 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
385
100
0
18 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
739
144
0
13 Feb 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti
Suraj Nair
Ashwin Balakrishna
Percy Liang
Thomas Kollar
Dorsa Sadigh
MLLM
VLM
275
240
0
12 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
464
46
0
08 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Jiaming Song
Yu Qiao
Shiyang Feng
MLLM
543
139
0
08 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
International Conference on Machine Learning (ICML), 2024
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Chen Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
267
79
0
05 Feb 2024
Previous
1
2
3
...
10
11
12
8
9
Next
Page 9 of 12
Page
of 12
Go