VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 712 papers shown

Title
How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing Shutong Jin Ruiyu Wang Muhammad Zahid Florian T. Pokorny 21 1 0 03 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video Xinhao Li Yuhan Zhu Limin Wang VLM 27 8 0 02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows Vincent Leroy Jérôme Revaud Thomas Lucas Philippe Weinzaepfel ViT 32 2 0 01 Oct 2023
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning F. Maani Asim Ukaye Nada Saadi Numan Saeed Mohammad Yaqub 84 1 0 30 Sep 2023
Towards Free Data Selection with General-Purpose Models Alessandro Mutti Mingyu Ding Patrizia Semeraro Wei Zhan 21 9 0 29 Sep 2023
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding Mingming Zhang Qingjie Liu Yunhong Wang 22 5 0 28 Sep 2023
Training a Large Video Model on a Single Machine in a Day Yue Zhao Philipp Krahenbuhl VLM 25 15 0 28 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Ruyang Liu Chen Li Yixiao Ge Ying Shan Thomas H. Li Ge Li 25 29 0 27 Sep 2023
$M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding$ M $^{3}$ 3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding Muhammad Abdullah Jamal Omid Mohareri 3DPC 16 1 0 26 Sep 2023
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios Francesco Ragusa Rosario Leonardi Michele Mazzamuto Claudia Bonanno Rosario Scavo Antonino Furnari G. Farinella 25 7 0 26 Sep 2023
IBVC: Interpolation-driven B-frame Video Compression Chenming Xu Meiqin Liu Chao Yao Weisi Lin Yao Zhao 42 8 0 25 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches Deepak Gupta Kush Attal Dina Demner-Fushman LM&MA 14 1 0 21 Sep 2023
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation S. K. Mukkavilli Daniel Salles Civitarese J. Schmude Johannes Jakubik Anne Jones ... R. Ganti Hendrik Hamann U. Nair Rahul Ramachandran Kommy Weldemariam AI4Cl AI4CE 28 18 0 19 Sep 2023
FoleyGen: Visually-Guided Audio Generation Xinhao Mei Varun K. Nagaraja Gaël Le Lan Zhaoheng Ni Ernie Chang Yangyang Shi Vikas Chandra VGen 16 20 0 19 Sep 2023
Unsupervised Open-Vocabulary Object Localization in Videos Ke Fan Zechen Bai Tianjun Xiao Dominik Zietlow Max Horn ... Bernt Schiele Thomas Brox Zheng-Wei Zhang Yanwei Fu Tong He 38 9 0 18 Sep 2023
FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector Qiqian Fu Guanhong Wang Gaoang Wang 12 0 0 16 Sep 2023
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer Fudong Lin Summer Crawford Kaleb Guillot Yihe Zhang Yan Chen ... Tri Setiyono B. Tubana Lu Peng Magdy A. Bayoumi N. Tzeng 42 20 0 16 Sep 2023
RMP: A Random Mask Pretrain Framework for Motion Prediction Yi Yang Qingwen Zhang Thomas Gilles Nazre Batool John Folkesson 46 5 0 16 Sep 2023
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder Xingjian Diao Ming Cheng Shitong Cheng VGen 19 8 0 15 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning Zhiwu Qing Shiwei Zhang Ziyuan Huang Yingya Zhang Changxin Gao Deli Zhao Nong Sang 19 18 0 14 Sep 2023
SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition Cong Wu Xiaojun Wu Josef Kittler Tianyang Xu Sara Atito Muhammad Awais Zhenhua Feng 22 3 0 11 Sep 2023
CDFSL-V: Cross-Domain Few-Shot Learning for Videos Sarinda Samarasinghe Mamshad Nayeem Rizve Navid Kardan M. Shah 13 11 0 07 Sep 2023
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers J. Denize Mykola Liashuha Jaonary Rabarisoa Astrid Orcesi Romain Hérault ViT 13 13 0 03 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image Modeling Qi Han Yuxuan Cai Xiangyu Zhang 33 7 0 02 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition Marcelo Sandoval-Castaneda Yanhong Li D. Brentari Karen Livescu Gregory Shakhnarovich SLR 8 2 0 02 Sep 2023
CL-MAE: Curriculum-Learned Masked Autoencoders Neelu Madan Nicolae-Cătălin Ristea Kamal Nasrollahi T. Moeslund Radu Tudor Ionescu 17 10 0 31 Aug 2023
IndGIC: Supervised Action Recognition under Low Illumination Jing-Teng Zeng 27 1 0 29 Aug 2023
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction Umar Khalid Hasan Iqbal Saeed Vahidian Jing Hua C. L. P. Chen 19 3 0 29 Aug 2023
Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities L. Akoglu Jaemin Yoo 20 1 0 28 Aug 2023
EventTransAct: A video transformer-based framework for Event-camera based action recognition Tristan de Blegiers I. Dave Adeel Yousaf M. Shah ViT 26 9 0 25 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning P. Balaji Abhijit Das Srijan Das A. Dantcheva CVBM 11 4 0 25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning D. Fan Jue Wang Shuai Liao Yi Zhu Vimal Bhat H. Santos-Villalobos M. Rohith Xinyu Li VGen 18 19 0 24 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding Mona Ahmadian Frank Guerin Andrew Gilbert 21 2 0 23 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation Hejun Xiao Kunyu Peng Xiangsheng Huang Alina Roitberg Hao Li Zhao Wang Rainer Stiefelhagen 18 3 0 23 Aug 2023
Audio-Visual Class-Incremental Learning Weiguo Pian Shentong Mo Yunhui Guo Yapeng Tian CLL VLM 20 27 0 21 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding Bingkun Huang Zhiyu Zhao Guozhen Zhang Yu Qiao Limin Wang 22 30 0 21 Aug 2023
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces Juan Hu Xin Liao Difei Gao Satoshi Tsutsui Qian Wang Zheng Qin Mike Zheng Shou CVBM AAML 27 1 0 19 Aug 2023
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos Zhiqiang Shen Xiaoxiao Sheng Hehe Fan Longguang Wang Y. Guo Qiong Liu Hao-Kai Wen Xiaoping Zhou 3DPC 15 14 0 18 Aug 2023
Learning to In-paint: Domain Adaptive Shape Completion for 3D Organ Segmentation Mingjin Chen Yongkang He Yongyi Lu Zhi-Yi Yang MedIm 19 0 0 17 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding Jiahao Wang Guo Chen Yifei Huang Liming Wang Tong Lu OffRL 54 37 0 15 Aug 2023
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis Esteve Valls Mascaro Hyemin Ahn Dongheui Lee CVBM 29 4 0 14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding Ziyuan Huang Shiwei Zhang Liang Pan Zhiwu Qing Yingya Zhang Ziwei Liu Marcelo H. Ang 28 9 0 10 Aug 2023
Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders Simon Dahan Logan Z. J. Williams Yourong Guo Daniel Rueckert E. C. Robinson 27 0 0 10 Aug 2023
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction Izzeddin Teeti Rongali Sai Bhargav Vivek Singh Andrew Bradley Biplab Banerjee Fabio Cuzzolin 19 1 0 08 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation Shuangrui Ding Peisen Zhao Xiaopeng Zhang Rui Qian H. Xiong Qi Tian ViT 16 16 0 08 Aug 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation Dongyang Yu Shihao Wang Yuan Fang Wangpeng An VGen 33 0 0 08 Aug 2023
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods Ya Jing Xuelin Zhu Xingbin Liu Qie Sima Taozheng Yang Yunhai Feng Tao Kong LM&Ro 25 16 0 07 Aug 2023
Multimodal Adaptation of CLIP for Few-Shot Action Recognition Jiazheng Xing Mengmeng Wang Xiaojun Hou Guangwen Dai Jingdong Wang Yong-Jin Liu VLM 15 0 0 03 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Enxin Song Wenhao Chai Guanhong Wang Yucheng Zhang Haoyang Zhou ... Tianbo Ye Yanting Zhang Yang Lu Jenq-Neng Hwang Gaoang Wang VLM MLLM 22 260 0 31 Jul 2023
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features Adrien Bardes Jean Ponce Yann LeCun MDE 31 23 0 24 Jul 2023