Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
39
38
0
10 May 2023
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
16
836
0
09 May 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
...
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
LRM
MLLM
12
79
0
09 May 2023
PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
Zhiqiang Shen
Xiaoxiao Sheng
Longguang Wang
Y. Guo
Qiong Liu
Xiaoping Zhou
3DPC
SSL
18
14
0
06 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
79
6
0
05 May 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
31
272
0
24 Apr 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
32
13
0
24 Apr 2023
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner
Benedikt Alkin
Andreas Fürst
Elisabeth Rumetshofer
Lukas Miklautz
Sepp Hochreiter
19
18
0
20 Apr 2023
Transformer-Based Visual Segmentation: A Survey
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
37
132
0
19 Apr 2023
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
Zheng Lian
Haiyang Sun
Licai Sun
Kang Chen
Mingyu Xu
...
Meng Wang
Erik Cambria
Guoying Zhao
Björn W. Schuller
Jianhua Tao
22
47
0
18 Apr 2023
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
36
14
0
17 Apr 2023
The 7th AI City Challenge
M. Naphade
Shuo Wang
D. Anastasiu
Zhenghang Tang
Ming-Ching Chang
...
Alice Li
Shangru Li
Krishna Kunadharaju
Shenxin Jiang
Ramalingam Chellappa
36
53
0
15 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
44
3,011
0
14 Apr 2023
Hard Patches Mining for Masked Image Modeling
Haochen Wang
Kaiyou Song
Junsong Fan
Yuxi Wang
Jin Xie
Zhaoxiang Zhang
29
59
0
12 Apr 2023
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection
Wei-Jhe Huang
Jheng-Hsien Yeh
Min-Hung Chen
Gueter Josmy Faure
S. Lai
20
3
0
10 Apr 2023
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
30
2
0
10 Apr 2023
StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation
Francesco Ragusa
G. Farinella
Antonino Furnari
16
18
0
08 Apr 2023
Self-Supervised Video Similarity Learning
Giorgos Kordopatis-Zilos
Giorgos Tolias
Christos Tzelepis
I. Kompatsiaris
Ioannis Patras
Symeon Papadopoulos
SSL
19
8
0
06 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
26
30
0
03 Apr 2023
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yong-Lu Li
Xiaoqian Wu
Xinpeng Liu
Zehao Wang
Yiming Dou
...
Junyi Zhang
Yixing Li
Jingru Tan
Xudong Lu
Cewu Lu
22
17
0
02 Apr 2023
Complementary Random Masking for RGB-Thermal Semantic Segmentation
Ukcheol Shin
Kyunghyun Lee
In So Kweon
Jean Oh
19
20
0
30 Mar 2023
Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
Chongjian Ge
Jiangliu Wang
Zhan Tong
Shoufa Chen
Yibing Song
Ping Luo
SSL
15
26
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
44
324
0
29 Mar 2023
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
Reuben Tan
Arijit Ray
Andrea Burns
Bryan A. Plummer
Justin Salamon
Oriol Nieto
Bryan C. Russell
Kate Saenko
20
20
0
28 Mar 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
I. Dave
Mamshad Nayeem Rizve
C. L. P. Chen
M. Shah
TTA
36
15
0
28 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
23
3
0
28 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
30
154
0
28 Mar 2023
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval
Qingguo Chen
Shilun Cai
C. Cai
Zefang Yu
Dahong Qian
Suncheng Xiang
28
7
0
28 Mar 2023
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder
Tao Sun
Lu Pang
Chao Chen
Haibin Ling
AAML
33
9
0
27 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
20
93
0
25 Mar 2023
3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
Lei Wang
Piotr Koniusz
ViT
21
45
0
25 Mar 2023
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
Jongheon Jeong
Sihyun Yu
Hankook Lee
Jinwoo Shin
AAML
31
0
0
24 Mar 2023
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
Runsen Xu
Tai Wang
Wenwei Zhang
Runjian Chen
Jinkun Cao
Jiangmiao Pang
Dahua Lin
3DPC
29
29
0
23 Mar 2023
A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition
Andong Deng
Taojiannan Yang
C. L. P. Chen
AI4TS
22
12
0
23 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
105
63
0
23 Mar 2023
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
Ce Zheng
Xianpeng Liu
Guo-Jun Qi
C. L. P. Chen
3DH
113
32
0
23 Mar 2023
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
J. Hernandez
Ruben Villegas
Vicente Ordonez
SSL
29
4
0
21 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
35
9
0
20 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
K. Sohn
ViT
19
37
0
17 Mar 2023
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
Xufeng Zhao
Mengdi Li
C. Weber
Muhammad Burhan Hafez
S. Wermter
LLMAG
LM&Ro
LRM
102
47
0
14 Mar 2023
Traj-MAE: Masked Autoencoders for Trajectory Prediction
Hao Chen
Jiaze Wang
Kun Shao
Furui Liu
Jianye Hao
Chenyong Guan
Guangyong Chen
Pheng-Ann Heng
55
38
0
12 Mar 2023
Improving Masked Autoencoders by Learning Where to Mask
Haijia Chen
Wendong Zhang
Yunbo Wang
Xiaokang Yang
SSL
13
20
0
12 Mar 2023
Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection
Juan Hu
Xin Liao
Difei Gao
Satoshi Tsutsui
Qian Wang
Zheng Qin
Mike Zheng Shou
AAML
11
4
0
03 Mar 2023
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Xiaoyu Shi
Zhaoyang Huang
Dasong Li
Manyuan Zhang
Ka Chun Cheung
Simon See
Hongwei Qin
Jifeng Dai
Hongsheng Li
25
82
0
02 Mar 2023
Valid Information Guidance Network for Compressed Video Quality Enhancement
Xuan Sun
Ziyue Zhang
Guannan Chen
Dan Zhu
31
0
0
28 Feb 2023
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Liya Wang
A. Tien
19
3
0
28 Feb 2023
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Percy Liang
LM&Ro
SSL
11
144
0
24 Feb 2023
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias
Bowen Zhao
Chen Chen
Qian-Wei Wang
Anfeng He
Shutao Xia
13
1
0
22 Feb 2023
Towards Efficient Visual Adaption via Structural Re-parameterization
Gen Luo
Minglang Huang
Yiyi Zhou
Xiaoshuai Sun
Guannan Jiang
Zhiyu Wang
Rongrong Ji
VLM
VPVLM
8
77
0
16 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
8
7
0
16 Feb 2023
Previous
1
2
3
...
11
12
13
14
15
Next