Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2305.06355
Cited By

VideoChat: Chat-Centric Video Understanding

v1v2 (latest)

VideoChat: Chat-Centric Video Understanding

10 May 2023

Yi Wang

Ping Luo

Yu Qiao

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (3246★)

Papers citing "VideoChat: Chat-Centric Video Understanding"

50 / 561 papers shown

SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

222

0

0

04 Dec 2025

PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

288

0

0

04 Dec 2025

ViDiC: Video Difference Captioning

ViDiC: Video Difference Captioning

153

0

0

03 Dec 2025

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

165

0

0

01 Dec 2025

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?

Apratim Bhattacharyya

Roland Memisevic

112

0

0

27 Nov 2025

AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs

AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs

212

0

0

26 Nov 2025

Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks

Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks

Bianka Kowalska

Halina Kwaśnicka

179

0

0

24 Nov 2025

VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning

VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning

...

324

3

0

24 Nov 2025

VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection

VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection

155

1

0

24 Nov 2025

VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models

VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models

Fufangchen Zhao

140

0

0

24 Nov 2025

EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs

EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs

73

0

0

23 Nov 2025

ViMix-14M: A Curated Multi-Source Video-Text Dataset with Long-Form, High-Quality Captions and Crawl-Free Access

ViMix-14M: A Curated Multi-Source Video-Text Dataset with Long-Form, High-Quality Captions and Crawl-Free Access

123

0

0

23 Nov 2025

Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning

Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning

140

0

0

22 Nov 2025

VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning

VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning

77

1

0

21 Nov 2025

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination

241

0

0

21 Nov 2025

SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM

SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM

Ming-Ching Chang

161

1

0

18 Nov 2025

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

253

2

0

18 Nov 2025

Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems

Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems

Philip Schniter

332

0

0

17 Nov 2025

Learning Skill-Attributes for Transferable Assessment in Video

Learning Skill-Attributes for Transferable Assessment in Video

Kristen Grauman

183

0

0

17 Nov 2025

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

...

Zhaoxiang Zhang

377

1

0

10 Nov 2025

VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models

VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models

175

0

0

10 Nov 2025

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Shengsheng Qian

OffRL AI4TS VLM

260

0

0

07 Nov 2025

Cambrian-S: Towards Spatial Supersensing in Video

Cambrian-S: Towards Spatial Supersensing in Video

...

173

15

0

06 Nov 2025

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

120

0

0

04 Nov 2025

Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders

Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders

Erfan Bagheri Soula

Simon Gottschalk

93

0

0

29 Oct 2025

SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation

SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation

61

0

0

29 Oct 2025

Positional Preservation Embedding for Multimodal Large Language Models

Positional Preservation Embedding for Multimodal Large Language Models

276

0

0

27 Oct 2025

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations

...

160

1

0

27 Oct 2025

A Video Is Not Worth a Thousand Words

A Video Is Not Worth a Thousand Words

107

0

0

27 Oct 2025

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

214

4

0

27 Oct 2025

HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models

HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models

233

0

0

23 Oct 2025

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

...

Tianhao Peng

Jiaheng Liu

165

4

0

20 Oct 2025

HouseTour: A Virtual Real Estate A(I)gent

HouseTour: A Virtual Real Estate A(I)gent

221

2

0

20 Oct 2025

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Shraman Pramanick

Lorenzo Torresani

Triantafyllos Afouras

180

0

0

19 Oct 2025

EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning

EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning

122

0

0

18 Oct 2025

RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba

RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba

...

Rainer Stiefelhagen

133

0

0

18 Oct 2025

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

Rolandos Alexandros Potamias

114

2

0

16 Oct 2025

MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos

MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos

Gabriel Fiastre

Cordelia Schmid

446

1

0

16 Oct 2025

Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs

Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs

95

0

0

15 Oct 2025

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

Mohamed Elhoseiny

111

4

0

15 Oct 2025

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

240

4

0

15 Oct 2025

VideoLucy: Deep Memory Backtracking for Long Video Understanding

VideoLucy: Deep Memory Backtracking for Long Video Understanding

141

2

0

14 Oct 2025

RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos

RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos

117

0

0

10 Oct 2025

Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization

Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization

Ashirbad Mishra

Naveen Ravipati

179

1

0

09 Oct 2025

Addressing the ID-Matching Challenge in Long Video Captioning

Addressing the ID-Matching Challenge in Long Video Captioning

116

0

0

08 Oct 2025

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

96

3

0

07 Oct 2025

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

Kristen Grauman

260

4

0

07 Oct 2025

Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning

Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning

...

252

1

0

05 Oct 2025

HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference

HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference

121

0

0

03 Oct 2025

Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback

Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback

Christine Klymko

Shashank Kushwaha

Felipe Leno Da Silva

179

0

0

02 Oct 2025

1 2 3 4...10 11 12