Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2304.14407
Cited By

ChatVideo: A Tracklet-centric Multimodal and Versatile Video
Understanding System

v1v2 (latest)

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

27 April 2023

Lu Yuan

Zuxuan Wu

ArXiv (abs)PDF HTML

Papers citing "ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System"

21 / 21 papers shown

PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

293

0

0

04 Dec 2025

SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding

SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding

Danny Tat Ming Chan

193

2

0

30 Aug 2025

Empowering Multimodal LLMs with External Tools: A Comprehensive Survey

Empowering Multimodal LLMs with External Tools: A Comprehensive Survey

182

1

0

14 Aug 2025

VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering

VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering

118

0

0

05 Aug 2025

From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding

From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding

...

119

0

0

03 Jul 2025

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

Shanghang Zhang

341

12

0

10 Jun 2025

PVChat: Personalized Video Chat with One-Shot Learning

PVChat: Personalized Video Chat with One-Shot Learning

370

2

0

21 Mar 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

927

9

0

08 Mar 2025

CoS: Chain-of-Shot Prompting for Long Video Understanding

CoS: Chain-of-Shot Prompting for Long Video Understanding

302

18

0

10 Feb 2025

Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models

Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024

268

5

0

13 Nov 2024

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

469

49

0

05 Aug 2024

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Elias Stengel-Eskin

Gedas Bertasius

Mohit Bansal

472

147

0

29 May 2024

MovieChat+: Question-aware Sparse Memory for Long Video Question
Answering

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

Xi Li

247

51

0

26 Apr 2024

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

419

47

0

18 Apr 2024

VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

Alexandros Xenos

Niki Maria Foteinopoulou

Georgios Tzimiropoulos

303

23

0

10 Apr 2024

LVCHAT: Facilitating Long Video Comprehension

LVCHAT: Facilitating Long Video Comprehension

Julian McAuley

145

6

0

19 Feb 2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)International Conference on Machine Learning (ICML), 2024

Guikun Chen

510

63

0

16 Jan 2024

Video Understanding with Large Language Models: A Survey

Video Understanding with Large Language Models: A Survey

...

713

167

0

29 Dec 2023

A Simple LLM Framework for Long-Range Video Question-Answering

A Simple LLM Framework for Long-Range Video Question-Answering

Md. Mohaiminul Islam

Mohit Bansal

Gedas Bertasius

383

152

0

28 Dec 2023

Foundational Models Defining a New Era in Vision: A Survey and Outlook

Foundational Models Defining a New Era in Vision: A Survey and Outlook

Muzammal Naseer

Salman Khan

Rao Muhammad Anwer

Hisham Cholakkal

Ming-Hsuan Yang

Fahad Shahbaz Khan

430

152

0

25 Jul 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language
Catalyst

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Jing Liu

322

68

0

25 May 2023