Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.05234
Cited By
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
11 October 2022
Yuxin Song
Min Yang
Wenhao Wu
Dongliang He
Fu Li
Jingdong Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training"
7 / 7 papers shown
Title
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Wenhao Wu
Dongliang He
Tianwei Lin
Fu Li
Chuang Gan
Errui Ding
85
68
0
13 Dec 2020
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
198
371
0
19 Oct 2020
1