Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
DyG2Vec: Efficient Representation Learning for Dynamic Graphs
Mohammad Ali Alomrani
Mahdi Biparva
Yingxue Zhang
Mark J. Coates
AI4TS
34
3
0
30 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
20
3
0
24 Oct 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Mantas Mazeika
Eric Tang
Andy Zou
Steven Basart
Jun Shern Chan
Dawn Song
David A. Forsyth
Jacob Steinhardt
Dan Hendrycks
16
8
0
18 Oct 2022
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
Dasom Ahn
Sangwon Kim
H. Hong
ByoungChul Ko
ViT
26
92
0
14 Oct 2022
Masked Motion Encoding for Self-Supervised Video Representation Learning
Xinyu Sun
Peihao Chen
Liang-Chieh Chen
Chan Li
Thomas H. Li
Mingkui Tan
Chuang Gan
27
28
0
12 Oct 2022
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
Yuxin Song
Min Yang
Wenhao Wu
Dongliang He
Fu Li
Jingdong Wang
ViT
95
8
0
11 Oct 2022
Turbo Training with Token Dropout
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
13
10
0
10 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
H. Yao
Yi-Xin Jiang
Xiatian Zhu
Zehuan Yuan
24
19
0
09 Oct 2022
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
146
238
0
06 Oct 2022
Making Your First Choice: To Address Cold Start Problem in Vision Active Learning
Liangyu Chen
Yutong Bai
Siyu Huang
Yongyi Lu
B. Wen
Alan Yuille
Zongwei Zhou
9
23
0
05 Oct 2022
Backdoor Attacks in the Supply Chain of Masked Image Modeling
Xinyue Shen
Xinlei He
Zheng Li
Yun Shen
Michael Backes
Yang Zhang
32
7
0
04 Oct 2022
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Bruce X. B. Yu
Jianlong Chang
Lin Liu
Qi Tian
Changan Chen
VPVLM
VLM
68
33
0
03 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
19
119
0
02 Oct 2022
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Ziyun Zeng
Yuying Ge
Xihui Liu
Bin Chen
Ping Luo
Shutao Xia
Yixiao Ge
AI4TS
29
8
0
30 Sep 2022
Hydra Attention: Efficient Attention with Many Heads
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Judy Hoffman
99
76
0
15 Sep 2022
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Farrukh Rahman
Ömer Mubarek
Z. Kira
ViT
10
2
0
15 Sep 2022
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Saeed Mian
ViT
19
43
0
13 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
16
63
0
04 Sep 2022
TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut
Yangtao Wang
Xiaoke Shen
Yuan. Yuan
Yuming Du
Maomao Li
S. Hu
James L. Crowley
Dominique Vaufreydaz
VOS
ViT
15
76
0
01 Sep 2022
Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective
Pengfei Wei
Lingdong Kong
Xinghua Qu
Yi Ren
Zhiqiang Xu
Jing Jiang
Xiang Yin
20
19
0
15 Aug 2022
Exploiting Feature Diversity for Make-up Temporal Video Grounding
Xiujun Shu
Wei Wen
Taian Guo
Su He
Chen Wu
Ruizhi Qiao
14
1
0
12 Aug 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
39
70
0
30 Jul 2022
MAR: Masked Autoencoders for Efficient Action Recognition
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Xiang Wang
Yuehuang Wang
Yiliang Lv
Changxin Gao
Nong Sang
19
42
0
24 Jul 2022
MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior
Jennifer J. Sun
Markus Marks
Andrew Ulmer
Dipam Chakraborty
Brian Geuther
...
Joseph Parker
Pietro Perona
Yisong Yue
K. Branson
Ann Kennedy
14
9
0
21 Jul 2022
E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Zizhang Li
Mengmeng Wang
Huaijin Pi
Kechun Xu
Jianbiao Mei
Yong Liu
16
70
0
17 Jul 2022
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
Yezhen Cong
Samarth Khanna
Chenlin Meng
Patrick Liu
Erik Rozi
Yutong He
Marshall Burke
David B. Lobell
Stefano Ermon
ViT
9
238
0
17 Jul 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
20
46
0
14 Jul 2022
Masked Surfel Prediction for Self-Supervised Point Cloud Learning
Yabin Zhang
Jiehong Lin
Chenhang He
Y. Chen
K. Jia
Lei Zhang
3DPC
16
19
0
07 Jul 2022
Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds
Georg Hess
Johan Jaxing
Elias Svensson
David Hagerman
Christoffer Petersson
Lennart Svensson
3DPC
ViT
16
33
0
01 Jul 2022
Masked World Models for Visual Control
Younggyo Seo
Danijar Hafner
Hao Liu
Fangchen Liu
Stephen James
Kimin Lee
Pieter Abbeel
OffRL
77
144
0
28 Jun 2022
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
Junting Pan
Ziyi Lin
Xiatian Zhu
Jing Shao
Hongsheng Li
12
188
0
27 Jun 2022
MaskViT: Masked Visual Pre-Training for Video Prediction
Agrim Gupta
Stephen Tian
Yunzhi Zhang
Jiajun Wu
Roberto Martín-Martín
Li Fei-Fei
100
109
0
23 Jun 2022
DDPM-CD: Denoising Diffusion Probabilistic Models as Feature Extractors for Change Detection
W. G. C. Bandara
Nithin Gopalakrishnan Nair
Vishal M. Patel
DiffM
14
4
0
23 Jun 2022
Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders
Chen Min
Xinli Xu
Dawei Zhao
Liang Xiao
Yiming Nie
Bin Dai
3DPC
25
48
0
20 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
22
130
0
18 Jun 2022
Masked Autoencoders for Generic Event Boundary Detection CVPR'2022 Kinetics-GEBD Challenge
Ruifei He
Yuanxi Sun
Youzeng Li
Zuwei Huang
Feng Hu
Xu Cheng
Jie Tang
17
3
0
17 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
25
97
0
16 Jun 2022
Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Xiang Li
Jinghuan Shang
Srijan Das
Michael S. Ryoo
SSL
17
31
0
10 Jun 2022
On Data Scaling in Masked Image Modeling
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Yutong Lin
Yixuan Wei
Qi Dai
Han Hu
20
51
0
09 Jun 2022
GMML is All you Need
Sara Atito
Muhammad Awais
J. Kittler
ViT
VLM
34
18
0
30 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
141
635
0
26 May 2022
Cross-Architecture Self-supervised Video Representation Learning
Sheng Guo
Zihua Xiong
Yujie Zhong
Limin Wang
Xiaobo Guo
Bing Han
Weilin Huang
SSL
AI4TS
58
24
0
26 May 2022
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
Jihao Liu
Xin Huang
Jinliang Zheng
Yu Liu
Hongsheng Li
22
53
0
26 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
95
73
0
20 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
27
45
0
05 May 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
9
43
0
26 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
6
54
0
06 Apr 2022
ObjectMix: Data Augmentation by Copy-Pasting Objects in Videos for Action Recognition
Jun Kimata
Tomoya Nitta
Toru Tamaki
23
10
0
01 Apr 2022
MixFormer: End-to-End Tracking with Iterative Mixed Attention
Yutao Cui
Jiang Cheng
Limin Wang
Gangshan Wu
VOT
23
452
0
21 Mar 2022
Previous
1
2
3
...
13
14
15
Next