VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 712 papers shown

Title
Advanced Gesture Recognition in Autism: Integrating YOLOv7, Video Augmentation and VideoMAE for Video Analysis Amit Kumar Singh Trapti Shrivastava Vrijendra Singh 16 0 0 12 Oct 2024
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture Sehun Kim 18 1 0 11 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation Haoyi Zhu Honghui Yang Yating Wang Jiange Yang Limin Wang Tong He 3DH 43 5 0 10 Oct 2024
The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024 Yinan Han Qingyuan Jiang Hongming Mei Yang Yang Jinhui Tang 17 0 0 08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling Yongxin Guo Jingyu Liu Mingda Li Xiaoying Tang Qingbin Liu Xiaoying Tang 30 14 0 08 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning Han Lin Tushar Nagarajan Nicolas Ballas Mido Assran Mojtaba Komeili Mohit Bansal Koustuv Sinha AI4TS 49 3 0 04 Oct 2024
AirLetters: An Open Video Dataset of Characters Drawn in the Air Rishit Dagli Guillaume Berger Joanna Materzynska Ingo Bax Roland Memisevic VGen 14 1 0 03 Oct 2024
An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos Arun V. Reddy Ketul Shah Corban Rivera William Paul Celso M. De Melo Rama Chellappa SLR 16 0 0 03 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations Minoh Jeong Min Namgung Zae Myung Kim Dongyeop Kang Yao-Yi Chiang Alfred Hero 23 0 0 02 Oct 2024
Pre-training with Synthetic Patterns for Audio Yuchi Ishikawa Tatsuya Komatsu Yoshimitsu Aoki 18 0 0 01 Oct 2024
TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids Mazen Balat Mahmoud Essam Gabr Hend Bakr A. Zaky 16 1 0 01 Oct 2024
Loose Social-Interaction Recognition in Real-world Therapy Scenarios Abid Ali Rui Dai Ashish Marisetty Guillaume Astruc Monique Thonnat J. Odobez Susanne Thümmler Francois Bremond 29 1 0 30 Sep 2024
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis Nishq Poorav Desai Ali Etemad Michael A. Greenspan 23 0 0 30 Sep 2024
Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024 Haowei Gu Weihao Zhu Yang Yang 20 0 0 29 Sep 2024
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis Shukesh Reddy Nishit Poddar Srijan Das Abhijit Das CVBM 20 0 0 29 Sep 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation Kun Su Xiulong Liu Eli Shlizerman VGen 28 6 0 27 Sep 2024
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks? Jose Sosa Mohamed Aloulou Danila Rukhovich Rim Sleimi Boonyarit Changaival Anis Kacem Djamila Aouada 25 0 0 27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining Ruiqi Xian Xiyang Wu Tianrui Guan Xijun Wang Boqing Gong Dinesh Manocha ViT 22 0 0 26 Sep 2024
Interpretable Action Recognition on Hard to Classify Actions Anastasia Anichenko Frank Guerin Andrew Gilbert 16 0 0 19 Sep 2024
Across-Game Engagement Modelling via Few-Shot Learning Kosmas Pinitas Konstantinos Makantasis Georgios N. Yannakakis 24 1 0 19 Sep 2024
Self-Supervised Pre-training Tasks for an fMRI Time-series Transformer in Autism Detection Yinchi Zhou Peiyu Duan Yuexi Du Nicha Dvornek MedIm 13 1 0 18 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities Bilal Faye Hanane Azzag M. Lebbah ObjD 23 0 0 17 Sep 2024
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion Lehong Wu Lilang Lin Jiahang Zhang Y. Ma Jiaying Liu DiffM 46 0 0 16 Sep 2024
Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better Mengying Ge Mingyang Li Dongkai Tang Pengbo Li Kuo Liu Shuhao Deng Songbai Pu L. Liu Yang Song Tao Zhang 23 0 0 12 Sep 2024
Data Collection-free Masked Video Modeling Yuchi Ishikawa Masayoshi Kondo Yoshimitsu Aoki ViT 19 1 0 10 Sep 2024
UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity Yicheng Fu R. Anantha Prabal Vashisht Jianpeng Cheng Etai Littwin 26 2 0 06 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Yunze Man Shuhong Zheng Zhipeng Bao M. Hebert Liang-Yan Gui Yu-xiong Wang 70 15 0 05 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline Zhuolin Tan Chenqiang Gao Anyong Qin Ruixin Chen Tiecheng Song Feng Yang Deyu Meng 14 0 0 02 Sep 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models Y. Guo Faizan Siddiqui Yang Zhao Rama Chellappa Shao-Yuan Lo LRM 24 2 0 31 Aug 2024
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning Zhiyuan Yan Yandan Zhao Shen Chen Xinghe Fu Taiping Yao Shouhong Ding Li Yuan 30 8 0 30 Aug 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer Shuai Peng Di Fu Baole Wei Yong Cao Liangcai Gao Zhi Tang ViT 30 1 0 30 Aug 2024
Online pre-training with long-form videos Itsuki Kato Kodai Kamiya Toru Tamaki OnRL 24 0 0 28 Aug 2024
Fine-grained length controllable video captioning with ordinal embeddings Tomoya Nitta Takumi Fukuzawa Toru Tamaki 25 0 0 27 Aug 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models Zejia Weng Xitong Yang Zhen Xing Zuxuan Wu Yu-Gang Jiang VGen DiffM 30 5 0 27 Aug 2024
MMASD+: A Novel Dataset for Privacy-Preserving Behavior Analysis of Children with Autism Spectrum Disorder Pavan Uttej Ravva Behdokht Kiafar Pinar Kullu Jicheng Li Anjana Bhat R. Barmaki 29 0 0 27 Aug 2024
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models Wentao Wu Fanghua Hong Xiao Wang Chenglong Li Jin Tang VLM 41 1 0 23 Aug 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended? Chen Liang Qiang Guo Xiaochao Qu Luoqi Liu Ting Liu VOS 32 0 0 20 Aug 2024
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition Zebang Cheng Shuyuan Tu Dawei Huang Minghan Li Xiaojiang Peng Zhi-Qi Cheng Alexander G. Hauptmann 43 2 0 20 Aug 2024
PooDLe: Pooled and dense self-supervised learning from naturalistic videos Alex N. Wang Christopher Hoang Yuwen Xiong Yann LeCun Mengye Ren 64 0 0 20 Aug 2024
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs Eui Jun Hwang Sukmin Cho Junmyeong Lee Jong C. Park SLR 59 4 0 20 Aug 2024
VrdONE: One-stage Video Visual Relation Detection Xinjie Jiang Chenxi Zheng Xuemiao Xu Bangzhen Liu Weiying Zheng Huaidong Zhang Shengfeng He VGen VOS 37 3 0 18 Aug 2024
Flatten: Video Action Recognition is an Image Classification task Junlin Chen Chengcheng Xu Yangfan Xu Jian Yang Jun Yu Li Zhiping Shi 18 1 0 17 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos Guozhen Zhang Jingyu Liu Shengming Cao Xiaotong Zhao Kevin Zhao Kai Ma Limin Wang ViT 27 1 0 13 Aug 2024
Membership Inference Attack Against Masked Image Modeling Z. Li Xinlei He Ning Yu Yang Zhang 38 1 0 13 Aug 2024
Masked Image Modeling: A Survey Vlad Hondru Florinel-Alin Croitoru Shervin Minaee Radu Tudor Ionescu N. Sebe 59 6 0 13 Aug 2024
Deep Multimodal Collaborative Learning for Polyp Re-Identification Suncheng Xiang Jincheng Li Zhengjie Zhang Shilun Cai Jiale Guan Dahong Qian 20 0 0 12 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning Rex Liu Xin Liu 18 1 0 08 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling Seok Hwan Lee Taein Son Soo Won Seo Jisong Kim Jun Won Choi 37 0 0 07 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation Xiaofeng Mao Zhengkai Jiang Qilin Wang Chencan Fu Jiangning Zhang Jiafu Wu Yabiao Wang Chengjie Wang Wei Li Mingmin Chi 70 4 0 06 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation Xin Liu Chao Hao Zitong Yu Huanjing Yue Jingyu Yang 23 1 0 05 Aug 2024