ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 712 papers shown
Title
Multitask Learning Can Improve Worst-Group Outcomes
Multitask Learning Can Improve Worst-Group Outcomes
Atharva Kulkarni
Lucio Dery
Amrith Rajagopal Setlur
Aditi Raghunathan
Ameet Talwalkar
Graham Neubig
32
1
0
05 Dec 2023
Are Vision Transformers More Data Hungry Than Newborn Visual Systems?
Are Vision Transformers More Data Hungry Than Newborn Visual Systems?
Lalit Pandey
Samantha M. W. Wood
Justin N. Wood
31
11
0
05 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
37
4
0
05 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
33
0
0
04 Dec 2023
Adapting Short-Term Transformers for Action Detection in Untrimmed
  Videos
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang
Huan Gao
Ping Guo
Limin Wang
ViT
28
5
0
04 Dec 2023
SANeRF-HQ: Segment Anything for NeRF in High Quality
SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu
Benran Hu
Chi-Keung Tang
Yu-Wing Tai
24
11
0
03 Dec 2023
Learning from One Continuous Video Stream
Learning from One Continuous Video Stream
João Carreira
Michael King
Viorica Patraucean
Dilara Gokay
Catalin Ionescu
...
Joseph Heyward
Carl Doersch
Y. Aytar
Dima Damen
Andrew Zisserman
CLL
18
4
0
01 Dec 2023
Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal
  Forecasting
Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting
Haotian Gao
Renhe Jiang
Zheng Dong
Jinliang Deng
Yuxin Ma
Xuan Song
AI4TS
44
15
0
01 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
23
49
0
01 Dec 2023
Dancing with Still Images: Video Distillation via Static-Dynamic
  Disentanglement
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang
Yue Xu
Cewu Lu
Yong-Lu Li
DD
22
8
0
01 Dec 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
33
11
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene
  for Holistic Video Understanding
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
21
2
0
30 Nov 2023
Action-slot: Visual Action-centric Representations for Multi-label
  Atomic Activity Recognition in Traffic Scenes
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung
Shu-Wei Lu
Yi-Hsuan Tsai
Yi-Ting Chen
30
6
0
29 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
38
0
0
28 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Bernard Ghanem
24
25
0
28 Nov 2023
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Yuwei Guo
Ceyuan Yang
Anyi Rao
Maneesh Agrawala
Dahua Lin
Bo Dai
DiffM
VGen
26
114
0
28 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action
  Recognition
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
21
1
0
28 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
21
17
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
25
6
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
29
3
0
25 Nov 2023
VLM-Eval: A General Evaluation on Video Large Language Models
VLM-Eval: A General Evaluation on Video Large Language Models
Shuailin Li
Yuang Zhang
Yucheng Zhao
Qiuyue Wang
Fan Jia
Yingfei Liu
Tiancai Wang
MLLM
ELM
26
2
0
20 Nov 2023
Pair-wise Layer Attention with Spatial Masking for Video Prediction
Pair-wise Layer Attention with Spatial Masking for Video Prediction
Ping Li
Chenhan Zhang
Zheng Yang
Xianghua Xu
Mingli Song
19
0
0
19 Nov 2023
Multi-entity Video Transformers for Fine-Grained Video Representation
  Learning
Multi-entity Video Transformers for Fine-Grained Video Representation Learning
Matthew Walmer
Rose Kanjirathinkal
Kai Sheng Tai
Keyur Muzumdar
Taipeng Tian
Abhinav Shrivastava
ViT
13
0
0
17 Nov 2023
Language Semantic Graph Guided Data-Efficient Learning
Language Semantic Graph Guided Data-Efficient Learning
Wenxuan Ma
Shuang Li
Lincan Cai
Jingxuan Kang
32
4
0
15 Nov 2023
SpectralGPT: Spectral Remote Sensing Foundation Model
SpectralGPT: Spectral Remote Sensing Foundation Model
Danfeng Hong
Bing Zhang
Xuyang Li
Yuxuan Li
Chenyu Li
...
Xiuping Jia
Antonio J. Plaza
Paolo Gamba
J. Benediktsson
J. Chanussot
30
383
0
13 Nov 2023
Learning Human Action Recognition Representations Without Real Humans
Learning Human Action Recognition Representations Without Real Humans
Howard Zhong
Samarth Mishra
Donghyun Kim
SouYoung Jin
Rameswar Panda
Hildegard Kuehne
Leonid Karlinsky
Venkatesh Saligrama
Aude Oliva
Rogerio Feris
24
3
0
10 Nov 2023
Semantic-aware Video Representation for Few-shot Action Recognition
Semantic-aware Video Representation for Few-shot Action Recognition
Yutao Tang
Benjamin Bejar
René Vidal
40
7
0
10 Nov 2023
Window Attention is Bugged: How not to Interpolate Position Embeddings
Window Attention is Bugged: How not to Interpolate Position Embeddings
Daniel Bolya
Chaitanya K. Ryali
Judy Hoffman
Christoph Feichtenhofer
43
10
0
09 Nov 2023
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao
Bingkun Huang
Sen Xing
Gangshan Wu
Yu Qiao
Limin Wang
29
5
0
06 Nov 2023
Holistic Representation Learning for Multitask Trajectory Anomaly
  Detection
Holistic Representation Learning for Multitask Trajectory Anomaly Detection
Alexandros Stergiou
B. D. Weerdt
Nikos Deligiannis
35
13
0
03 Nov 2023
FLAP: Fast Language-Audio Pre-training
FLAP: Fast Language-Audio Pre-training
Ching-Feng Yeh
Po-Yao Huang
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
CLIP
VLM
36
8
0
02 Nov 2023
Concatenated Masked Autoencoders as Spatial-Temporal Learner
Concatenated Masked Autoencoders as Spatial-Temporal Learner
Zhouqiang Jiang
Bowen Wang
Tong Xiang
Zhaofeng Niu
Hong Tang
Guangshun Li
Liangzhi Li
14
2
0
02 Nov 2023
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked
  Autoencoders
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
Srijan Das
Tanmay Jain
Dominick Reilly
P. Balaji
Soumyajit Karmakar
Shyam Marjit
Xiang Li
Abhijit Das
Michael S. Ryoo
32
16
0
31 Oct 2023
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
Junkun Yuan
Xinyu Zhang
Hao Zhou
Jian Wang
Zhongwei Qiu
...
Junyu Han
Errui Ding
Lanfen Lin
Fei Wu
Jingdong Wang
30
18
0
31 Oct 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
35
2
0
30 Oct 2023
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species
  Classification and Mapping
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
S. Sastry
Subash Khanal
A. Dhakal
Di Huang
Nathan Jacobs
40
9
0
29 Oct 2023
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language
  Understanding
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Shuhuai Ren
Sishuo Chen
Shicheng Li
Xu Sun
Lu Hou
ViT
36
28
0
29 Oct 2023
Foundation Models for Generalist Geospatial Artificial Intelligence
Foundation Models for Generalist Geospatial Artificial Intelligence
Johannes Jakubik
Sujit Roy
C. Phillips
P. Fraccaro
Denys Godwin
...
Hamed Alemohammad
M. Maskey
R. Ganti
Kommy Weldemariam
Rahul Ramachandran
AI4CE
VLM
21
91
0
28 Oct 2023
Bridging The Gaps Between Token Pruning and Full Pre-training via Masked
  Fine-tuning
Bridging The Gaps Between Token Pruning and Full Pre-training via Masked Fine-tuning
Fengyuan Shi
Limin Wang
ViT
24
0
0
26 Oct 2023
Frozen Transformers in Language Models Are Effective Visual Encoder
  Layers
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Ziqi Pang
Ziyang Xie
Yunze Man
Yu-xiong Wang
40
25
0
19 Oct 2023
Runner re-identification from single-view running video in the
  open-world setting
Runner re-identification from single-view running video in the open-world setting
Tomohiro Suzuki
Kazushi Tsutsui
K. Takeda
Keisuke Fujii
23
1
0
18 Oct 2023
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Sudeep Dasari
M. K. Srirama
Unnat Jain
Abhinav Gupta
SSL
32
34
0
13 Oct 2023
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang
Sha Zhang
Di Huang
Xiaoyang Wu
Haoyi Zhu
...
Hengshuang Zhao
Qibo Qiu
Binbin Lin
Xiaofei He
Wanli Ouyang
SSL
28
44
0
12 Oct 2023
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Haoyi Zhu
Honghui Yang
Xiaoyang Wu
Di Huang
Sha Zhang
...
Hengshuang Zhao
Chunhua Shen
Yu Qiao
Tong He
Wanli Ouyang
SSL
69
43
0
12 Oct 2023
Boundary Discretization and Reliable Classification Network for Temporal
  Action Detection
Boundary Discretization and Reliable Classification Network for Temporal Action Detection
Zhenying Fang
Jun Yu
Richang Hong
18
0
0
10 Oct 2023
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
  Learning
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
Yinda Chen
Wei Huang
Shenglong Zhou
Qi Chen
Zhiwei Xiong
25
25
0
06 Oct 2023
Diffusion Models as Masked Audio-Video Learners
Diffusion Models as Masked Audio-Video Learners
Elvis Nunez
Yanzi Jin
Mohammad Rastegari
Sachin Mehta
Maxwell Horton
20
2
0
05 Oct 2023
Reinforcement Learning-based Mixture of Vision Transformers for Video
  Violence Recognition
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition
Hamid Reza Mohammadi
Ehsan Nazerfard
Tahereh Firoozi
ViT
23
2
0
04 Oct 2023
Multiple Physics Pretraining for Physical Surrogate Models
Multiple Physics Pretraining for Physical Surrogate Models
Michael McCabe
Bruno Régaldo-Saint Blancard
Liam Parker
Ruben Ohana
M. Cranmer
...
Francois Lanusse
Mariel Pettee
Tiberiu Teşileanu
Kyunghyun Cho
Shirley Ho
PINN
AI4CE
23
51
0
04 Oct 2023
A Spatio-Temporal Attention-Based Method for Detecting Student Classroom
  Behaviors
A Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors
Fan Yang
27
2
0
04 Oct 2023
Previous
123...8910...131415
Next