Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing
Shutong Jin
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
21
1
0
03 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
27
8
0
02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
32
2
0
01 Oct 2023
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning
F. Maani
Asim Ukaye
Nada Saadi
Numan Saeed
Mohammad Yaqub
84
1
0
30 Sep 2023
Towards Free Data Selection with General-Purpose Models
Alessandro Mutti
Mingyu Ding
Patrizia Semeraro
Wei Zhan
21
9
0
29 Sep 2023
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding
Mingming Zhang
Qingjie Liu
Yunhong Wang
22
5
0
28 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
25
29
0
27 Sep 2023
M
3
^{3}
3
3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
16
1
0
26 Sep 2023
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios
Francesco Ragusa
Rosario Leonardi
Michele Mazzamuto
Claudia Bonanno
Rosario Scavo
Antonino Furnari
G. Farinella
25
7
0
26 Sep 2023
IBVC: Interpolation-driven B-frame Video Compression
Chenming Xu
Meiqin Liu
Chao Yao
Weisi Lin
Yao Zhao
42
8
0
25 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta
Kush Attal
Dina Demner-Fushman
LM&MA
14
1
0
21 Sep 2023
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation
S. K. Mukkavilli
Daniel Salles Civitarese
J. Schmude
Johannes Jakubik
Anne Jones
...
R. Ganti
Hendrik Hamann
U. Nair
Rahul Ramachandran
Kommy Weldemariam
AI4Cl
AI4CE
28
18
0
19 Sep 2023
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
16
20
0
19 Sep 2023
Unsupervised Open-Vocabulary Object Localization in Videos
Ke Fan
Zechen Bai
Tianjun Xiao
Dominik Zietlow
Max Horn
...
Bernt Schiele
Thomas Brox
Zheng-Wei Zhang
Yanwei Fu
Tong He
38
9
0
18 Sep 2023
FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector
Qiqian Fu
Guanhong Wang
Gaoang Wang
12
0
0
16 Sep 2023
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer
Fudong Lin
Summer Crawford
Kaleb Guillot
Yihe Zhang
Yan Chen
...
Tri Setiyono
B. Tubana
Lu Peng
Magdy A. Bayoumi
N. Tzeng
42
20
0
16 Sep 2023
RMP: A Random Mask Pretrain Framework for Motion Prediction
Yi Yang
Qingwen Zhang
Thomas Gilles
Nazre Batool
John Folkesson
46
5
0
16 Sep 2023
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder
Xingjian Diao
Ming Cheng
Shitong Cheng
VGen
19
8
0
15 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
19
18
0
14 Sep 2023
SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition
Cong Wu
Xiaojun Wu
Josef Kittler
Tianyang Xu
Sara Atito
Muhammad Awais
Zhenhua Feng
22
3
0
11 Sep 2023
CDFSL-V: Cross-Domain Few-Shot Learning for Videos
Sarinda Samarasinghe
Mamshad Nayeem Rizve
Navid Kardan
M. Shah
13
11
0
07 Sep 2023
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers
J. Denize
Mykola Liashuha
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
ViT
13
13
0
03 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Qi Han
Yuxuan Cai
Xiangyu Zhang
33
7
0
02 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
8
2
0
02 Sep 2023
CL-MAE: Curriculum-Learned Masked Autoencoders
Neelu Madan
Nicolae-Cătălin Ristea
Kamal Nasrollahi
T. Moeslund
Radu Tudor Ionescu
17
10
0
31 Aug 2023
IndGIC: Supervised Action Recognition under Low Illumination
Jing-Teng Zeng
27
1
0
29 Aug 2023
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction
Umar Khalid
Hasan Iqbal
Saeed Vahidian
Jing Hua
C. L. P. Chen
19
3
0
29 Aug 2023
Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities
L. Akoglu
Jaemin Yoo
20
1
0
28 Aug 2023
EventTransAct: A video transformer-based framework for Event-camera based action recognition
Tristan de Blegiers
I. Dave
Adeel Yousaf
M. Shah
ViT
26
9
0
25 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning
P. Balaji
Abhijit Das
Srijan Das
A. Dantcheva
CVBM
11
4
0
25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
18
19
0
24 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding
Mona Ahmadian
Frank Guerin
Andrew Gilbert
21
2
0
23 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
18
3
0
23 Aug 2023
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
20
27
0
21 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
22
30
0
21 Aug 2023
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces
Juan Hu
Xin Liao
Difei Gao
Satoshi Tsutsui
Qian Wang
Zheng Qin
Mike Zheng Shou
CVBM
AAML
27
1
0
19 Aug 2023
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos
Zhiqiang Shen
Xiaoxiao Sheng
Hehe Fan
Longguang Wang
Y. Guo
Qiong Liu
Hao-Kai Wen
Xiaoping Zhou
3DPC
15
14
0
18 Aug 2023
Learning to In-paint: Domain Adaptive Shape Completion for 3D Organ Segmentation
Mingjin Chen
Yongkang He
Yongyi Lu
Zhi-Yi Yang
MedIm
19
0
0
17 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding
Jiahao Wang
Guo Chen
Yifei Huang
Liming Wang
Tong Lu
OffRL
54
37
0
15 Aug 2023
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis
Esteve Valls Mascaro
Hyemin Ahn
Dongheui Lee
CVBM
29
4
0
14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
28
9
0
10 Aug 2023
Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders
Simon Dahan
Logan Z. J. Williams
Yourong Guo
Daniel Rueckert
E. C. Robinson
27
0
0
10 Aug 2023
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction
Izzeddin Teeti
Rongali Sai Bhargav
Vivek Singh
Andrew Bradley
Biplab Banerjee
Fabio Cuzzolin
19
1
0
08 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
16
16
0
08 Aug 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
33
0
0
08 Aug 2023
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Ya Jing
Xuelin Zhu
Xingbin Liu
Qie Sima
Taozheng Yang
Yunhai Feng
Tao Kong
LM&Ro
25
16
0
07 Aug 2023
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Jiazheng Xing
Mengmeng Wang
Xiaojun Hou
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
VLM
15
0
0
03 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Jenq-Neng Hwang
Gaoang Wang
VLM
MLLM
22
260
0
31 Jul 2023
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
Adrien Bardes
Jean Ponce
Yann LeCun
MDE
31
23
0
24 Jul 2023
Previous
1
2
3
...
9
10
11
...
13
14
15
Next