Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.06355
Cited By
v1
v2 (latest)
VideoChat: Chat-Centric Video Understanding
10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3246★)
Papers citing
"VideoChat: Chat-Centric Video Understanding"
50 / 561 papers shown
SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding
Chang-Hsun Wu
Kai-Po Chang
Yu-Yang Sheng
Hung-Kai Chung
Kuei-Chun Wang
Yu-Jie Wang
MLLM
222
0
0
04 Dec 2025
PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement
Yu-Wei Zhan
Xin Wang
Hong Chen
Tongtong Feng
Wei Feng
Ren Wang
Guangyao Li
Qing Li
Wenwu Zhu
VGen
288
0
0
04 Dec 2025
ViDiC: Video Difference Captioning
J. Wu
S. Li
Zhaozhou Bian
J. Chen
Runzhe Wen
An Ping
Yiwen He
Jiakai Wang
Yuanxing Zhang
Jiaheng Liu
153
0
0
03 Dec 2025
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Chenting Wang
Yuhan Zhu
Yicheng Xu
Jiange Yang
Ziang Yan
Yali Wang
Yi Wang
Limin Wang
VGen
165
0
0
01 Dec 2025
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Apratim Bhattacharyya
Bicheng Xu
Sanjay Haresh
Reza Pourreza
Litian Liu
Sunny Panchal
Pulkit Madan
Leonid Sigal
Roland Memisevic
112
0
0
27 Nov 2025
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
Shuhan Xia
Peipei Li
Xuannan Liu
Dongsen Zhang
Xinyu Guo
Zekun Li
AAML
212
0
0
26 Nov 2025
Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks
Bianka Kowalska
Halina Kwaśnicka
179
0
0
24 Nov 2025
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
Boyu Chen
Zikang Wang
Zhengrong Yue
Kainan Yan
Chenyun Yu
...
Yafei Wen
Xiaoxin Chen
Yang Liu
Peng Li
Yali Wang
LLMAG
324
3
0
24 Nov 2025
VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection
Qiang Wang
Xinyuan Gao
Songlin Dong
Jizhou Han
Jiangyang Li
Yuhang He
Yihong Gong
VGen
155
1
0
24 Nov 2025
VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models
Fufangchen Zhao
Liao Zhang
Daiqi Shi
Yuanjun Gao
Chen Ye
Yang Cai
Jian Gao
Danfeng Yan
VLM
140
0
0
24 Nov 2025
EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs
Shaoyu Liu
Jianing Li
Guanghui Zhao
Y. Zhang
Xiangyang Ji
73
0
0
23 Nov 2025
ViMix-14M: A Curated Multi-Source Video-Text Dataset with Long-Form, High-Quality Captions and Crawl-Free Access
Timing Yang
Sucheng Ren
Alan Yuille
Feng Wang
VGen
123
0
0
23 Nov 2025
Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning
Xiaohong Liu
Xiufeng Song
Huayu Zheng
Lei Bai
Xiaoming Liu
Guangtao Zhai
DiffM
140
0
0
22 Nov 2025
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
Lingxiao Li
Y. Wang
Xinyan Gao
Chen Tang
Xiangyu Yue
Chenyu You
LRM
77
1
0
21 Nov 2025
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
Y. Tang
Daiki Shimada
Hang Hua
Chao Huang
Jing Bi
Rogerio Feris
Chenliang Xu
241
0
0
21 Nov 2025
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
An Yu
Weiheng Lu
Jian Li
Zhenfei Zhang
Yunhang Shen
Felix X.-F. Ye
Ming-Ching Chang
161
1
0
18 Nov 2025
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao
Kele Shao
Bohan Yu
Weiqiang Wang
Jian Liu
Huan Wang
VLM
253
2
0
18 Nov 2025
Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems
Jeffrey Wen
Rizwan Ahmad
Philip Schniter
MedIm
332
0
0
17 Nov 2025
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh
Kristen Grauman
183
0
0
17 Nov 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Tianhao Peng
Haochen Wang
Yuanxing Zhang
Zekun Wang
Zili Wang
...
Wei Ji
Pengfei Wan
Wenhao Huang
Zhaoxiang Zhang
Jiaheng Liu
ELM
377
1
0
10 Nov 2025
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
Ying Cheng
Y. Lin
Min-Hung Chen
Fu-En Yang
S. Lai
175
0
0
10 Nov 2025
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
Zhenyu Yang
Kairui Zhang
Yuhang Hu
Bing Wang
Shengsheng Qian
Bin Wen
Fan Yang
Tingting Gao
Weiming Dong
Changsheng Xu
OffRL
AI4TS
VLM
260
0
0
07 Nov 2025
Cambrian-S: Towards Spatial Supersensing in Video
Shusheng Yang
J. Yang
Pinzhi Huang
Ellis L Brown
Zihao Yang
...
Daohan Lu
Rob Fergus
Yann LeCun
Li Fei-Fei
Saining Xie
173
15
0
06 Nov 2025
VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models
Zhicheng Zhang
Weicheng Wang
Yongjie Zhu
Wenyu Qin
Pengfei Wan
Di Zhang
Jufeng Yang
120
0
0
04 Nov 2025
Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
Ali Rasekh
Erfan Bagheri Soula
Omid Daliran
Simon Gottschalk
Mohsen Fayyaz
93
0
0
29 Oct 2025
SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation
Wang zhi
Y. Liu
Liu Liu
Li Zhang
Ruixuan Lu
Dan Guo
61
0
0
29 Oct 2025
Positional Preservation Embedding for Multimodal Large Language Models
Mouxiao Huang
Borui Jiang
Dehua Zheng
Hailin Hu
Kai Han
Xinghao Chen
VLM
276
0
0
27 Oct 2025
VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations
Lu Dong
H. Zhang
Han Lin
Ziang Yan
Xiangyu Zeng
...
Yifei Huang
Yi Wang
Z. Ling
Limin Wang
Yali Wang
OffRL
160
1
0
27 Oct 2025
A Video Is Not Worth a Thousand Words
Sam Pollard
Michael Wray
107
0
0
27 Oct 2025
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Baoqi Pei
Yifei Huang
Jilan Xu
Yuping He
Guo Chen
Fei Wu
Yu Qiao
Jiangmiao Pang
EgoV
LRM
214
4
0
27 Oct 2025
HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models
Zelin Peng
Zhengqin Xu
Qingyang Liu
Xiaokang Yang
Wei Shen
233
0
0
23 Oct 2025
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
Yaning Pan
Z. Wang
Qianqian Xie
Yongqian Wen
Y. Zhang
...
An Ping
Tianhao Peng
Jiaheng Liu
Tianhao Peng
Jiaheng Liu
165
4
0
20 Oct 2025
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen
Marc Pollefeys
Daniel Barath
Iro Armeni
VGen
221
2
0
20 Oct 2025
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick
E. Mavroudi
Yale Song
Rama Chellappa
Lorenzo Torresani
Triantafyllos Afouras
180
0
0
19 Oct 2025
EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning
Haoran Sun
Chen Cai
Huiping Zhuang
Kong Aik Lee
Lap-Pui Chau
Yi Wang
122
0
0
18 Oct 2025
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
Kunyu Peng
Di Wen
Jia Fu
Jiamin Wu
Kailun Yang
...
Yufan Chen
Yuqian Fu
D. Paudel
Luc Van Gool
Rainer Stiefelhagen
133
0
0
18 Oct 2025
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang
Yuanfan Guo
Rolandos Alexandros Potamias
Jiankang Deng
Hang Xu
Chao Ma
LRM
114
2
0
16 Oct 2025
MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos
Gabriel Fiastre
Antoine Yang
Cordelia Schmid
VOS
446
1
0
16 Oct 2025
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
Minji Kim
Taekyung Kim
Bohyung Han
95
0
0
15 Oct 2025
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Xiaoqian Shen
Wenxuan Zhang
Jun-Cheng Chen
Mohamed Elhoseiny
VLM
LRM
111
4
0
15 Oct 2025
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Run Luo
Xiaobo Xia
Lu Wang
Longze Chen
Renke Shan
Jing Luo
Min Yang
Tat-Seng Chua
VGen
240
4
0
15 Oct 2025
VideoLucy: Deep Memory Backtracking for Long Video Understanding
Jialong Zuo
Yongtai Deng
Lingdong Kong
J. Yang
Rui Jin
Y. Zhang
Nong Sang
Liang Pan
Ziwei Liu
Changxin Gao
141
2
0
14 Oct 2025
RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos
Zixi Yang
Jiapeng Li
Muxi Diao
Yinuo Jing
Kongming Liang
AAML
VGen
117
0
0
10 Oct 2025
Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization
Shuo Xing
Soumik Dey
Mingyang Wu
Ashirbad Mishra
Naveen Ravipati
Binbin Li
Hansi Wu
Zhengzhong Tu
179
1
0
09 Oct 2025
Addressing the ID-Matching Challenge in Long Video Captioning
Zhantao Yang
Huangji Wang
Ruili Feng
Han Zhang
Yuting Hu
Shangwen Zhu
Junyan Li
Yu Liu
Fan Cheng
116
0
0
08 Oct 2025
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu
Shangkun Sun
Haoran Tang
Ge Li
Wei-Nan Gao
VGen
VLM
96
3
0
07 Oct 2025
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
M. Luo
Zihui Xue
Alex Dimakis
Kristen Grauman
VGen
LRM
260
4
0
07 Oct 2025
Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning
C. Wang
Donglin Bai
Yifan Yang
Xiao Jin
Anlan Zhang
...
Jingdong Sun
Chong Luo
Ting Cao
Lili Qiu
Suman Banerjee
252
1
0
05 Oct 2025
HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
Shubham Negi
Kaushik Roy
121
0
0
03 Oct 2025
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
Derek Shi
Ruben Glatt
Christine Klymko
Shubham Mohole
Hongjun Choi
Shashank Kushwaha
Sam Sakla
Felipe Leno Da Silva
AI4TS
VLM
179
0
0
02 Oct 2025
1
2
3
4
...
10
11
12
Next