ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 712 papers shown
Title
Towards Privacy-Aware Sign Language Translation at Scale
Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust
Bowen Shi
Skyler Wang
Necati Cihan Camgöz
Jean Maillard
SLR
37
14
0
14 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on
  Unlabeled Public Videos
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
19
1
0
14 Feb 2024
Leveraging Self-Supervised Instance Contrastive Learning for Radar
  Object Detection
Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection
Colin Decourt
R. V. Rullen
D. Salle
Thomas Oberlin
SSL
28
0
0
13 Feb 2024
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive
  Reasoning through Theory of Mind
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
16
3
0
12 Feb 2024
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties
Jingyuan Sun
Mingxiao Li
Zijiao Chen
Marie-Francine Moens
VGen
26
7
0
02 Feb 2024
Multi-Modal Machine Learning Framework for Automated Seizure Detection
  in Laboratory Rats
Multi-Modal Machine Learning Framework for Automated Seizure Detection in Laboratory Rats
Aaron D. Mullen
Samuel E. Armstrong
Jasmine Perdeh
Bjorn Bauer
Jeff Talbert
V. Bumgardner
17
0
0
01 Feb 2024
Machine Unlearning for Image-to-Image Generative Models
Machine Unlearning for Image-to-Image Generative Models
Guihong Li
Hsiang Hsu
Chun-Fu Chen
R. Marculescu
MU
VLM
64
25
0
01 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
MV2MAE: Multi-View Video Masked Autoencoders
MV2MAE: Multi-View Video Masked Autoencoders
Ketul Shah
Robert Crandall
Jie Xu
Peng Zhou
Marian George
Mayank Bansal
Rama Chellappa
20
4
0
29 Jan 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
  Modalities
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
16
7
0
25 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
26
14
0
25 Jan 2024
Delocate: Detection and Localization for Deepfake Videos with
  Randomly-Located Tampered Traces
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces
Juan Hu
Xin Liao
Difei Gao
Satoshi Tsutsui
Qian Wang
Zheng Qin
Mike Zheng Shou
21
4
0
24 Jan 2024
GTAutoAct: An Automatic Datasets Generation Framework Based on Game
  Engine Redevelopment for Action Recognition
GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition
Xingyu Song
Zhan Li
Shi Chen
K. Demachi
19
1
0
24 Jan 2024
Multi-modal News Understanding with Professionally Labelled Videos
  (ReutersViLNews)
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Shih-Han Chou
Matthew Kowal
Yasmin Niknam
Diana Moyano
Shayaan Mehdi
...
Cheng Zhang
Ian Knopke
S. Kocak
Leonid Sigal
Yalda Mohsenzadeh
25
1
0
23 Jan 2024
Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action
  Classification
Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification
Jimmy Lin
Junkai Li
Jiasi Gao
Weizhi Ma
Yang Liu
18
0
0
21 Jan 2024
Understanding Video Transformers via Universal Concept Discovery
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
27
8
0
19 Jan 2024
Learning to Visually Connect Actions and their Effects
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
22
2
0
19 Jan 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
44
19
0
19 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese
  Masked Conditional Variational Autoencoder
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
11
0
0
18 Jan 2024
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point
  Cloud Video Understanding
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding
Yunze Liu
Changxi Chen
Zifan Wang
Li Yi
3DPC
23
3
0
17 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie M. Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
56
0
0
15 Jan 2024
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised
  Audio-Visual Emotion Recognition
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
Licai Sun
Zheng Lian
Bin Liu
Jianhua Tao
51
29
0
11 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
25
0
0
10 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
30
6
0
08 Jan 2024
Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for
  Memory-Efficient Finetuning
Dr2^22Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao
Shuming Liu
K. Mangalam
Guocheng Qian
Fatimah Zohra
Abdulmohsen Alghannam
Jitendra Malik
Bernard Ghanem
38
3
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for
  Audio-Video Classification
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
19
4
0
08 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion
  Recognition
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Licai Sun
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
11
12
0
07 Jan 2024
Retrieval-Augmented Egocentric Video Captioning
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu
Yifei Huang
Junlin Hou
Guo Chen
Yue Zhang
Rui Feng
Weidi Xie
EgoV
34
28
0
01 Jan 2024
Skeleton2vec: A Self-supervised Learning Framework with Contextualized
  Target Representations for Skeleton Sequence
Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence
Ruizhuo Xu
Linzhi Huang
Mei Wang
Jiani Hu
Weihong Deng
ViT
MedIm
27
1
0
01 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
29
13
0
31 Dec 2023
SVFAP: Self-supervised Video Facial Affect Perceiver
SVFAP: Self-supervised Video Facial Affect Perceiver
Licai Sun
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Bin Liu
Jianhua Tao
42
14
0
31 Dec 2023
Multiscale Vision Transformers meet Bipartite Matching for efficient
  single-stage Action Localization
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
47
4
0
29 Dec 2023
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
50
81
0
29 Dec 2023
SAIC: Integration of Speech Anonymization and Identity Classification
SAIC: Integration of Speech Anonymization and Identity Classification
Ming Cheng
Xingjian Diao
Shitong Cheng
Wenjun Liu
43
6
0
23 Dec 2023
CaptainCook4D: A dataset for understanding errors in procedural
  activities
CaptainCook4D: A dataset for understanding errors in procedural activities
Rohith Peddi
Shivvrat Arya
B. Challa
Likhitha Pallapothula
Akshay Vyas
...
Vasundhara Komaragiri
Eric D. Ragan
Nicholas Ruozzi
Yu Xiang
Vibhav Gogate
50
8
0
22 Dec 2023
Bootstrap Masked Visual Modeling via Hard Patches Mining
Bootstrap Masked Visual Modeling via Hard Patches Mining
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tiancai Wang
Xiangyu Zhang
Zhaoxiang Zhang
34
5
0
21 Dec 2023
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
I. Dave
Simon Jenni
Mubarak Shah
25
7
0
20 Dec 2023
M-BEV: Masked BEV Perception for Robust Autonomous Driving
M-BEV: Masked BEV Perception for Robust Autonomous Driving
Siran Chen
Yue Ma
Yu Qiao
Yali Wang
19
8
0
19 Dec 2023
Text-Conditioned Resampler For Long Form Video Understanding
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
28
12
0
19 Dec 2023
T-MAE: Temporal Masked Autoencoders for Point Cloud Representation
  Learning
T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
Weijie Wei
F. Karimi Nejadasl
Theo Gevers
Martin R. Oswald
3DPC
23
3
0
15 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
35
3
0
15 Dec 2023
Structural Information Guided Multimodal Pre-training for
  Vehicle-centric Perception
Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception
Xiao Wang
Wentao Wu
Chenglong Li
Zhicheng Zhao
Zhe Chen
Yukai Shi
Jin Tang
38
4
0
15 Dec 2023
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained
  Locality Learning Matters in Consistency Regularization
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained Locality Learning Matters in Consistency Regularization
W. Pan
Zhe Xu
Jiangpeng Yan
Zihan Wu
R. Tong
Xiu Li
Jianhua Yao
ISeg
24
1
0
14 Dec 2023
Survey on Foundation Models for Prognostics and Health Management in Industrial Cyber-Physical Systems
Ruonan Liu
Quanhu Zhang
Te Han
AI4CE
25
2
0
11 Dec 2023
Counterfactual World Modeling for Physical Dynamics Understanding
Counterfactual World Modeling for Physical Dynamics Understanding
Rahul Venkatesh
Honglin Chen
Kevin T. Feigelis
Daniel M. Bear
Khaled Jedoui
...
Wanhee Lee
Sherry Liu
Kevin A. Smith
Judith E. Fan
Daniel L. K. Yamins
VGen
38
1
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
17
37
0
11 Dec 2023
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial
  Expression Recognition in Videos
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos
Yin Chen
Jia Li
Shiguang Shan
Meng Wang
Richang Hong
46
32
0
09 Dec 2023
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form
  Egocentric Videos
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
33
15
0
07 Dec 2023
A brief introduction to a framework named Multilevel
  Guidance-Exploration Network
A brief introduction to a framework named Multilevel Guidance-Exploration Network
Guoqing Yang
Zhiming Luo
Jianzhe Gao
Yingxin Lai
Kun Yang
Yifan He
Shaozi Li
3DH
24
0
0
07 Dec 2023
Deep Multimodal Fusion for Surgical Feedback Classification
Deep Multimodal Fusion for Surgical Feedback Classification
Rafal Kocielnik
Elyssa Y. Wong
Timothy N. Chu
Lydia Lin
De-An Huang
Jiayun Wang
A. Anandkumar
Andrew J. Hung
11
2
0
06 Dec 2023
Previous
123...789...131415
Next