ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 712 papers shown
Title
Thoracic Surgery Video Analysis for Surgical Phase Recognition
Thoracic Surgery Video Analysis for Surgical Phase Recognition
S. Mateen
Niharika Malvia
Syed Abdul Khader
Danny Wang
Deepti Srinivasan
Chi-Fu Jeffrey Yang
Lana Schumacher
Sandeep Manjanna
11
0
0
13 Jun 2024
Towards Multilingual Audio-Visual Question Answering
Towards Multilingual Audio-Visual Question Answering
Orchid Chetia Phukan
Priyabrata Mallick
Swarup Ranjan Behera
Aalekhya Satya Narayani
Arun Balaji Buduru
Rajesh Sharma
37
0
0
13 Jun 2024
A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing
  pre-training method based on anchor-aware masked autoencoder
A2^{2}2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder
Lixian Zhang
Yi Zhao
Runmin Dong
Jinxiao Zhang
Shuai Yuan
...
Weijia Li
Wei Liu
Wayne Zhang
Litong Feng
H. Fu
29
3
0
12 Jun 2024
Labeling Comic Mischief Content in Online Videos with a Multimodal
  Hierarchical-Cross-Attention Model
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
34
0
0
12 Jun 2024
Visual Representation Learning with Stochastic Frame Prediction
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang
Dongyoung Kim
Junsu Kim
Jinwoo Shin
Pieter Abbeel
Younggyo Seo
29
2
0
11 Jun 2024
Investigating Pre-Training Objectives for Generalization in Vision-Based
  Reinforcement Learning
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Donghu Kim
Hojoon Lee
Kyungmin Lee
Dongyoon Hwang
Jaegul Choo
OffRL
29
1
0
10 Jun 2024
CorrMAE: Pre-training Correspondence Transformers with Masked
  Autoencoder
CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder
Tangfei Liao
Xiaoqin Zhang
Guobao Xiao
Min Li
Tao Wang
Mang Ye
33
1
0
09 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
97
16
0
06 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
37
1
0
05 Jun 2024
Self-Supervised Skeleton-Based Action Representation Learning: A
  Benchmark and Beyond
Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond
Jiahang Zhang
Lilang Lin
Shuai Yang
Jiaying Liu
SSL
41
0
0
05 Jun 2024
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff
Surya Koppisetti
Nicolò Bonettini
Divyaraj Solanki
Ben Colman
Yaser Yacoob
Ali Shahriyari
Gaurav Bharaj
30
20
0
05 Jun 2024
AFF-ttention! Affordances and Attention models for Short-Term Object
  Interaction Anticipation
AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
Lorenzo Mur-Labadia
Ruben Martinez-Cantin
Jose J. Guerrero
G. Farinella
Antonino Furnari
27
4
0
03 Jun 2024
Understanding the Cross-Domain Capabilities of Video-Based Few-Shot
  Action Recognition Models
Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models
Georgia Markham
M. Balamurali
Andrew J. Hill
32
1
0
03 Jun 2024
DroneVis: Versatile Computer Vision Library for Drones
DroneVis: Versatile Computer Vision Library for Drones
Ahmed Heakl
F. Youssef
Victor Parque
Walid Gomaa
AI4TS
32
1
0
01 Jun 2024
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign
  Language Recognition
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Weichao Zhao
Hezhen Hu
Wen-gang Zhou
Yunyao Mao
Min Wang
Houqiang Li
SLR
31
8
0
31 May 2024
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action
  Recognition
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Masashi Hatano
Ryo Hachiuma
Ryoske Fujii
Hideo Saito
EgoV
32
4
0
30 May 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo
  Benchmark
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
...
Jun Lan
Huijia Zhu
Jianfu Zhang
Weiqiang Wang
Huaxiong Li
Mamba
80
13
0
30 May 2024
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from
  Egocentric Open Surgery Videos
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
Ryoske Fujii
Masashi Hatano
Hideo Saito
Hiroki Kajita
29
5
0
30 May 2024
Visualizing the loss landscape of Self-supervised Vision Transformer
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Sung Ju Hwang
ViT
20
1
0
28 May 2024
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to
  Multimodal Inputs
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Mustafa Shukor
Matthieu Cord
61
5
0
26 May 2024
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to
  Biological Motion Perception
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception
Shuangpeng Han
Ziyu Wang
Mengmi Zhang
24
0
0
26 May 2024
Streaming Long Video Understanding with Large Language Models
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Jiaqi Wang
VLM
29
40
0
25 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Alan L. Yuille
Cihang Xie
AI4TS
VGen
SSL
49
1
0
24 May 2024
SIAVC: Semi-Supervised Framework for Industrial Accident Video
  Classification
SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification
Zuoyong Li
Qinghua Lin
Haoyi Fan
Tiesong Zhao
David Zhang
24
0
0
23 May 2024
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Mohammed Baharoon
Jonathan Klein
D. L. Michels
SSL
VLM
34
0
0
23 May 2024
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
Zhifan Wan
Jie M. Zhang
Chang-bo Li
Shiguang Shan
60
0
0
21 May 2024
Open-Vocabulary Spatio-Temporal Action Detection
Open-Vocabulary Spatio-Temporal Action Detection
Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
ObjD
23
5
0
17 May 2024
Infer Induced Sentiment of Comment Response to Video: A New Task,
  Dataset and Baseline
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
Qi Jia
Baoyu Fan
Cong Xu
Lu Liu
Liang Jin
Guoguang Du
Zhenhua Guo
Yaqian Zhao
Xuanjing Huang
Rengang Li
31
0
0
15 May 2024
A Semantic and Motion-Aware Spatiotemporal Transformer Network for
  Action Detection
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection
Matthew Korban
Peter Youngs
Scott T. Acton
ViT
27
6
0
13 May 2024
SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
Sushant Gautam
Mehdi Houshmand Sarkhoosh
Jan Held
Cise Midoglu
A. Cioppa
Silvio Giancola
Vajira Thambawita
Michael A. Riegler
P. Halvorsen
Mubarak Shah
24
4
0
12 May 2024
Learning Latent Dynamic Robust Representations for World Models
Learning Latent Dynamic Robust Representations for World Models
Ruixiang Sun
Hongyu Zang
Xin-hui Li
Riashat Islam
27
4
0
10 May 2024
HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous
  Serverless Functions
HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
Jiabin Chen
Fei Xu
Yikun Gu
Li Chen
Fangming Liu
Zhi Zhou
14
6
0
09 May 2024
Hierarchical Space-Time Attention for Micro-Expression Recognition
Hierarchical Space-Time Attention for Micro-Expression Recognition
Haihong Hao
Shuo Wang
Huixia Ben
Yanbin Hao
Yansong Wang
Weiwei Wang
16
1
0
06 May 2024
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial
  Representation Learning
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi
A. Kariryaa
Stefan Oehmcke
Serge J. Belongie
Christian Igel
Nico Lang
30
24
0
04 May 2024
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic
  Activity Recognition
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition
Meiqi Cao
Rui Yan
Xiangbo Shu
Guangzhao Dai
Yazhou Yao
Guo-Sen Xie
34
0
0
04 May 2024
Self-Supervised Learning for Interventional Image Analytics: Towards
  Robust Device Trackers
Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
Saahil Islam
Venkatesh N. Murthy
Dominik Neumann
B. K. Das
Puneet Sharma
Andreas K. Maier
D. Comaniciu
Florin-Cristian Ghesu
19
1
0
02 May 2024
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion
  Recognition
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition
Peihao Xiang
Chaohao Lin
Kaida Wu
Ou Bai
27
3
0
28 Apr 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and
  Open-Vocabulary Multimodal Emotion Recognition
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Licai Sun
Zhuofan Wen
Siyuan Zhang
...
Bin Liu
Erik Cambria
Guoying Zhao
Björn W. Schuller
Jianhua Tao
VLM
31
11
0
26 Apr 2024
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
Chunyi Li
Tengchuan Kou
...
Qi Yan
Youran Qu
Xiaohui Zeng
Lele Wang
Renjie Liao
48
29
0
25 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
30
37
0
24 Apr 2024
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
Jiaxin Zhuang
Linshan Wu
Qiong Wang
V. Vardhanabhuti
Lin Luo
Hao Chen
Hao Chen
49
4
0
24 Apr 2024
On the Content Bias in Fréchet Video Distance
On the Content Bias in Fréchet Video Distance
Jason S. Hoffman
Aniruddha Mahapatra
Gaurav Parmar
Jun-Yan Zhu
Jia-Bin Huang
EGVM
50
15
0
18 Apr 2024
Predicting Long-horizon Futures by Conditioning on Geometry and Time
Predicting Long-horizon Futures by Conditioning on Geometry and Time
Tarasha Khurana
Deva Ramanan
AI4TS
26
0
0
17 Apr 2024
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods
  and Results
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Xin Li
Kun Yuan
Yajing Pei
Yiting Lu
Ming-hui Sun
...
Kele Xu
Qisheng Xu
Tao Sun
Zhi-Guo Ding
Yuhan Hu
41
23
0
17 Apr 2024
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar
Arya Bakhtiar
Danny Tran
Antonio Loquercio
Jathushan Rajasegaran
Yann LeCun
Amir Globerson
Trevor Darrell
EgoV
33
4
0
15 Apr 2024
STMixer: A One-Stage Sparse Action Detector
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
22
0
0
15 Apr 2024
The 8th AI City Challenge
The 8th AI City Challenge
Shuo Wang
D. Anastasiu
Zhenghang Tang
Ming-Ching Chang
Yue Yao
...
Xunlei Wu
S. Pusegaonkar
Yizhou Wang
Sujit Biswas
Rama Chellappa
28
31
0
15 Apr 2024
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision
  Transformers
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Diana-Nicoleta Grigore
Mariana-Iuliana Georgescu
J. A. Justo
T. Johansen
Andreea-Iuliana Ionescu
Radu Tudor Ionescu
21
0
0
14 Apr 2024
An Animation-based Augmentation Approach for Action Recognition from
  Discontinuous Video
An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video
Xingyu Song
Zhan Li
Shi Chen
Xin-Qiang Cai
K. Demachi
26
2
0
10 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
81
88
0
08 Apr 2024
Previous
123...567...131415
Next