Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2305.06355
Cited By
v1
v2 (latest)
VideoChat: Chat-Centric Video Understanding
10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3246★)
Papers citing
"VideoChat: Chat-Centric Video Understanding"
50 / 558 papers shown
Title
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Computer Vision and Pattern Recognition (CVPR), 2025
Wei Li
Bing Hu
Rui Shao
Leyang Shen
Liqiang Nie
247
30
0
05 Mar 2025
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
Computer Vision and Pattern Recognition (CVPR), 2025
Ege Özsoy
Chantal Pellegrini
Tobias Czempiel
Felix Tristram
Kun Yuan
David Bani-Harouni
U. Eck
Benjamin Busam
Matthias Keicher
Nassir Navab
302
13
0
04 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
248
0
0
04 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Computer Vision and Pattern Recognition (CVPR), 2025
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
374
6
0
03 Mar 2025
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2025
Tianyu Huai
Jie Zhou
Xingjiao Wu
Qin Chen
Qingchun Bai
Ze Zhou
Liang He
MoE
269
10
0
01 Mar 2025
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
Yuhao Li
Mirana Claire Angel
Salman Khan
Yu Zhu
Jinqiu Sun
Yanning Zhang
Fahad Shahbaz Khan
VGen
221
4
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
514
12
0
26 Feb 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
183
0
0
20 Feb 2025
L4P: Towards Unified Low-Level 4D Vision Perception
Abhishek Badki
Hang Su
Bowen Wen
Orazio Gallo
VLM
437
4
0
18 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
265
20
0
17 Feb 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanwei Li
Yu Qi
...
Shen Yan
Bo Zhang
Chaoyou Fu
Peng Gao
Jiaming Song
MLLM
LRM
405
83
0
13 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
609
10
0
07 Feb 2025
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
Jiaxing Zhao
Q. Yang
Yixing Peng
Detao Bai
Shimin Yao
...
Xiang Chen
Shenghao Fu
Weixuan chen
Xihan Wei
Liefeng Bo
VGen
AuLLM
268
27
0
28 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
308
60
0
28 Jan 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
488
115
0
21 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
IEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
666
66
0
17 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Computer Vision and Pattern Recognition (CVPR), 2025
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
504
8
0
14 Jan 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
145
7
0
13 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Computer Vision and Pattern Recognition (CVPR), 2023
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Yuan Liu
Kaipeng Zhang
Dahua Lin
Yu Qiao
Shiyang Feng
Xiangyu Yue
MLLM
516
190
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
IEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
422
6
0
10 Jan 2025
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
International Conference on Machine Learning (ICML), 2024
Hao Fei
Shengqiong Wu
Wei Ji
Hao Zhang
Hao Fei
Yang Deng
Wynne Hsu
LRM
VGen
369
142
0
08 Jan 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Computer Vision and Pattern Recognition (CVPR), 2025
Rui Qian
Shuangrui Ding
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
224
30
0
06 Jan 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Wenyi Hong
Yean Cheng
Zhiyong Yang
Weihan Wang
Lefan Wang
Xiaohan Zhang
Xiaotao Gu
Yuxiao Dong
J. Tang
CoGe
VLM
245
24
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
422
32
0
06 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
918
15
0
05 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2024
Yueze Wang
Yan Shu
Bo Zhao
Boya Wu
Junjie Zhou
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
452
11
0
03 Jan 2025
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval
Yifan Xu
Xinhao Li
Yichun Yang
Rui Huang
Limin Wang
Limin Wang
VGen
183
0
0
31 Dec 2024
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Computer Vision and Pattern Recognition (CVPR), 2023
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Celine Lee
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
248
38
0
31 Dec 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
786
104
0
31 Dec 2024
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Neural Information Processing Systems (NeurIPS), 2024
Hao Fei
Shengqiong Wu
Hao Zhang
Tat-Seng Chua
Shuicheng Yan
431
70
0
31 Dec 2024
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Yuanmin Huang
Jilan Xu
Baoqi Pei
Yuping He
Guo Chen
...
Kunpeng Li
C. Yuan
Yidan Wang
Yu Qiao
L. Wang
396
13
0
31 Dec 2024
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
Jing Liu
N. Shah
Ping Chen
328
19
0
18 Dec 2024
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei
Yujie Zhong
Haoxian Tan
Yingsen Zeng
Yong Liu
Zheng Zhao
Yujiu Yang
MLLM
VLM
VOS
252
11
0
18 Dec 2024
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Computer Vision and Pattern Recognition (CVPR), 2024
Changan Chen
Juze Zhang
S. K. Lakshmikanth
Yusu Fang
Ruizhi Shao
Gordon Wetzstein
L. Fei-Fei
Ehsan Adeli
VGen
312
15
0
13 Dec 2024
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2024
Orr Zohar
Xiaohan Wang
Yann Dubois
Nikhil Mehta
Tong Xiao
...
Xiaofang Wang
F. Xu
Ning Zhang
Serena Yeung-Levy
Xide Xia
VLM
359
0
0
13 Dec 2024
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Chenyu Yang
Xuan Dong
X. Zhu
Weijie Su
Jiahao Wang
H. Tian
Zheyu Chen
Wenhai Wang
Lewei Lu
Jifeng Dai
VLM
192
9
0
12 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
Kai Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
350
33
0
12 Dec 2024
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Haobo Wang
Yuxiang Nie
Yongjie Ye
Deng GuanYu
Yanjie Wang
Shuai Li
Haiyang Yu
Jinghui Lu
Can Huang
VLM
MLLM
224
13
0
12 Dec 2024
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Sai Bhargav Rongali
M. Cui
Ankit Jha
Neha Bhargava
Saurabh Prasad
Biplab Banerjee
238
0
0
12 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
408
5
0
12 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Mingyu Ding
Xihui Liu
LLMAG
LRM
348
18
0
05 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
555
6
0
04 Dec 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Meng Cao
Haoran Tang
Haoze Zhao
Hangyu Guo
Jing Liu
Ge Zhang
Ruyang Liu
Qiang Sun
Ian Reid
Xiaodan Liang
386
9
0
02 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
Computer Vision and Pattern Recognition (CVPR), 2024
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Boddeti
Du Tran
VLM
530
7
0
02 Dec 2024
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Weiming Ren
Huan Yang
Jie Min
Cong Wei
Lei Ma
848
9
0
01 Dec 2024
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
Peijun Bao
Chenqi Kong
Zihao Shao
Boon Poh Ng
Meng Hwa Er
Alex C. Kot
226
3
0
01 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
578
5
0
01 Dec 2024
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Xubing Ye
Yukang Gan
Yixiao Ge
Xiao Zhang
Yansong Tang
303
28
0
30 Nov 2024
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Computer Vision and Pattern Recognition (CVPR), 2024
Jianping Jiang
Weiye Xiao
Zhengyu Lin
Han Zhang
Tianxiang Ren
Yang Gao
Zhiqian Lin
Zhongang Cai
Lei Yang
Ziwei Liu
305
8
0
29 Nov 2024
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
Shimin Chen
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
MLLM
284
41
0
27 Nov 2024
Previous
1
2
3
4
5
6
...
10
11
12
Next