ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.14407
  4. Cited By
ChatVideo: A Tracklet-centric Multimodal and Versatile Video
  Understanding System
v1v2 (latest)

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

27 April 2023
Junke Wang
Dongdong Chen
Chong Luo
Xiyang Dai
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
ArXiv (abs)PDFHTML

Papers citing "ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System"

21 / 21 papers shown
PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement
PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement
Yu-Wei Zhan
Xin Wang
Hong Chen
Tongtong Feng
Wei Feng
Ren Wang
Guangyao Li
Qing Li
Wenwu Zhu
VGen
293
0
0
04 Dec 2025
SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding
SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding
Zhen Chen
Xingjian Luo
Kun Yuan
J. Wu
Danny Tat Ming Chan
Nassir Navab
Hongbin Liu
Zhen Lei
Jiebo Luo
193
2
0
30 Aug 2025
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Wenbin An
Jiahao Nie
Yaqiang Wu
Feng Tian
Shijian Lu
Q. Zheng
MLLM
182
1
0
14 Aug 2025
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering
Yiran Meng
Junhong Ye
Wei Zhou
Guanghui Yue
Xudong Mao
Ruomei Wang
Baoquan Zhao
118
0
0
05 Aug 2025
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
Xiangfeng Wang
Xiao Li
Yadong Wei
Xueyu Song
Yang Song
...
Fangrui Zeng
Zaiyi Chen
Liu Liu
Gu Xu
Tong Xu
VGen
119
0
0
03 Jul 2025
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought
Shuyi Zhang
Xiaoshuai Hao
Yingbo Tang
Lingfeng Zhang
Pengwei Wang
Zhongyuan Wang
Hongxuan Ma
Shanghang Zhang
VGenAI4TS
341
12
0
10 Jun 2025
PVChat: Personalized Video Chat with One-Shot Learning
PVChat: Personalized Video Chat with One-Shot Learning
Yufei Shi
Weilong Yan
Gang Xu
Yumeng Li
Yongqian Li
Hao Sun
Fei Richard Yu
Ming Li
Si Yong Yeo
370
2
0
21 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Yue Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
927
9
0
08 Mar 2025
CoS: Chain-of-Shot Prompting for Long Video Understanding
CoS: Chain-of-Shot Prompting for Long Video Understanding
Jian Hu
Zixu Cheng
Chenyang Si
Wei Li
Shaogang Gong
302
18
0
10 Feb 2025
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Quan Zhang
Yuxin Qi
Rui Yuan
Xi Tang
Yuxin Qi
Ke Zhang
Chun Yuan
268
5
0
13 Nov 2024
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Yanda Li
Chi Zhang
Wenjia Jiang
Wanqi Yang
Bin-Bin Fu
Pei Cheng
Xin Chen
Yunchao Wei
Y. X. Wei
LLMAGLM&Ro
469
49
0
05 Aug 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
472
147
0
29 May 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question
  Answering
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Lei Li
Xi Li
Gaoang Wang
VLMMLLM
247
51
0
26 Apr 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
419
47
0
18 Apr 2024
VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning
VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning
Alexandros Xenos
Niki Maria Foteinopoulou
Ioanna Ntinou
Ioannis Patras
Georgios Tzimiropoulos
303
23
0
10 Apr 2024
LVCHAT: Facilitating Long Video Comprehension
LVCHAT: Facilitating Long Video Comprehension
Yu Wang
Zeyuan Zhang
Julian McAuley
Zexue He
VLM
145
6
0
19 Feb 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)International Conference on Machine Learning (ICML), 2024
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&RoLLMAG
510
63
0
16 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
713
167
0
29 Dec 2023
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
383
152
0
28 Dec 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
430
152
0
25 Jul 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
322
68
0
25 May 2023
1