Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.17935
Cited By
OmniVid: A Generative Framework for Universal Video Understanding
26 March 2024
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OmniVid: A Generative Framework for Universal Video Understanding"
13 / 13 papers shown
Title
Do Language Models Understand Time?
Xi Ding
Lei Wang
143
0
0
18 Dec 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
71
87
0
08 Apr 2024
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Junke Wang
Dongdong Chen
Chong Luo
Xiyang Dai
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
84
54
0
27 Apr 2023
Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
Xin Chen
Houwen Peng
Jiawen Zhu
Dong Wang
Han Hu
Huchuan Lu
48
22
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Rethinking Resolution in the Context of Efficient Video Recognition
Chuofan Ma
Qiushan Guo
Yi-Xin Jiang
Zehuan Yuan
Ping Luo
Xiaojuan Qi
44
10
0
26 Sep 2022
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
Shilong Liu
Feng Li
Hao Zhang
X. Yang
Xianbiao Qi
Hang Su
Jun Zhu
Lei Zhang
ViT
132
703
0
28 Jan 2022
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
223
341
0
22 Sep 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
272
1,939
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
188
375
0
01 Feb 2021
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Biagio Brattoli
Joseph Tighe
Fedor Zhdanov
Pietro Perona
Krzysztof Chalupka
VLM
113
119
0
03 Mar 2020
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
56
158
0
27 Aug 2019
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild
Matthias Muller
Adel Bibi
Silvio Giancola
Salman Al-Subaihi
Bernard Ghanem
192
676
0
28 Mar 2018
1