Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.14407
Cited By
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
27 April 2023
Junke Wang
Dongdong Chen
Chong Luo
Xiyang Dai
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System"
4 / 4 papers shown
Title
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
32
0
0
08 Mar 2025
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
26
35
0
16 Jan 2024
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
107
148
0
14 Oct 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
229
499
0
22 Apr 2021
1