Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks
Maria Pilligua
Danna Xue
Javier Vázquez-Corral
45
0
0
21 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Haozhe Si
Yuxuan Wan
Minh Do
Deepak Vasisht
Han Zhao
Hendrik Hamann
41
0
0
17 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
48
1
0
17 Mar 2025
Multi Activity Sequence Alignment via Implicit Clustering
Taein Kwon
Zador Pataki
Mahdi Rad
Marc Pollefeys
HAI
AI4TS
60
0
0
16 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
47
0
0
16 Mar 2025
Domain Generalization for Improved Human Activity Recognition in Office Space Videos Using Adaptive Pre-processing
Partho Ghosh
Raisa Bentay Hossain
Mohammad Zunaed
Taufiq Hasan
48
0
0
16 Mar 2025
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang
Qixiang Zhang
Lehan Wang
Xuanqi Huang
Xiaomeng Li
VOS
VGen
55
0
0
14 Mar 2025
A Large-Scale Study on Video Action Dataset Condensation
Yang Chen
Sheng Guo
Bo Zheng
Limin Wang
DD
77
2
0
13 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
65
3
0
13 Mar 2025
Semantic Latent Motion for Portrait Video Generation
Qiyuan Zhang
Chenyu Wu
Wenzhang Sun
Huaize Liu
Donglin Di
Wei Chen
Changqing Zou
VGen
67
0
0
13 Mar 2025
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong
Necati Cihan Camgöz
Richard Bowden
SLR
53
0
0
11 Mar 2025
Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos
Pramit Saha
Divyanshu Mishra
Netzahualcoyotl Hernandez-Cruz
Olga Patey
A. Papageorghiou
Yuki M. Asano
J. A. Noble
38
0
0
10 Mar 2025
COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition
Baiyu Chen
Wilson Wongso
Zechen Li
Yonchanok Khaokaew
Hao Xue
Flora D. Salim
56
0
0
10 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Z. Wang
Bernard Ghanem
44
1
0
09 Mar 2025
End-to-End Action Segmentation Transformer
Tieqiao Wang
Sinisa Todorovic
ViT
37
0
0
08 Mar 2025
OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking
Franklin Mingzhe Li
Kaitlyn Ng
Bin Zhu
Patrick Carrington
40
0
0
07 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
37
0
0
04 Mar 2025
A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging
William Michael Laprade
Jesper Cairo Westergaard
Svend Christensen
Mads Nielsen
Anders Bjorholm Dahl
63
0
0
03 Mar 2025
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions
Haoxin Li
Yingchen Yu
Qilong Wu
Hanwang Zhang
Boyang Li
Song Bai
3DH
VGen
66
0
0
01 Mar 2025
Anatomically-guided masked autoencoder pre-training for aneurysm detection
Alberto Mario Ceballos-Arroyo
Jisoo Kim
C. Lin
Lei Qin
Geoffrey S. Young
Huaizu Jiang
ViT
MedIm
28
0
0
28 Feb 2025
Soften the Mask: Adaptive Temporal Soft Mask for Efficient Dynamic Facial Expression Recognition
Mengzhu Li
Quanxing Zha
Hongjun Wu
CVBM
48
0
0
28 Feb 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
55
3
0
27 Feb 2025
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard
J. Odobez
57
0
0
27 Feb 2025
EndoMamba: An Efficient Foundation Model for Endoscopic Videos
Qingyao Tian
Huai Liao
Xinyan Huang
Bingyu Yang
Dongdong Lei
Sebastien Ourselin
Hongbin Liu
Mamba
68
0
0
26 Feb 2025
Multispectral to Hyperspectral using Pretrained Foundational model
Ruben Gonzalez
C. Albrecht
Nassim Ait Ali Braham
Devyani Lambhate
Joao Lucas de Sousa Almeida
P. Fraccaro
Benedikt Blumenstiel
Thomas Brunschwiler
Ranjini Bangalore
59
0
0
26 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
72
0
0
25 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
57
8
0
24 Feb 2025
Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Yizhuo Lu
Changde Du
Chong Wang
Xuanliu Zhu
Liuyun Jiang
Xujin Li
Huiguang He
VGen
105
4
0
20 Feb 2025
L4P: Low-Level 4D Vision Perception Unified
Abhishek Badki
Hang Su
Bowen Wen
Orazio Gallo
VLM
75
1
0
18 Feb 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni
Yuxin Guo
Yichen Liu
Rui Chen
Lewei Lu
Z. Wu
DiffM
VGen
59
3
0
17 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
X. Zhang
Jingyu Wang
Wenbing Tao
52
4
0
17 Feb 2025
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
Ruoxuan Feng
Jiangyu Hu
Wenke Xia
Tianci Gao
Ao Shen
Yuhao Sun
Bin Fang
Di Hu
42
3
0
15 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
95
4
0
12 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
112
0
0
12 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSL
MQ
72
0
0
04 Feb 2025
FireCastNet: Earth-as-a-Graph for Seasonal Fire Prediction
Dimitrios Michail
Charalampos Davalas
Lefki-Ioanna Panagiotou
Ioannis Prapas
Spyros Kondylatos
N. Bountos
Ioannis Papoutsis
40
0
0
03 Feb 2025
Data-efficient Performance Modeling via Pre-training
Chunting Liu
Riyadh Baghdadi
39
0
0
24 Jan 2025
A Novel Tracking Framework for Devices in X-ray Leveraging Supplementary Cue-Driven Self-Supervised Features
Saahil Islam
Venkatesh N. Murthy
Dominik Neumann
Serkan Cimen
Puneet Sharma
Andreas K. Maier
D. Comaniciu
Florin-Cristian Ghesu
29
0
0
22 Jan 2025
Slot-BERT: Self-supervised Object Discovery in Surgical Video
Guiqiu Liao
M. Jogan
Marcel Hussing
Kenta Nakahashi
Kazuhiro Yasufuku
Amin Madani
Eric Eaton
Daniel A. Hashimoto
53
0
0
21 Jan 2025
MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance
Jialong Guo
Ke Liu
Jiangchao Yao
Zhihua Wang
Jiajun Bu
Haishuai Wang
AI4TS
40
0
0
20 Jan 2025
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
R. Yasarla
Manish Kumar Singh
Hong Cai
Yunxiao Shi
Jisoo Jeong
Yinhao Zhu
Shizhong Han
Risheek Garrepalli
Fatih Porikli
MDE
80
5
0
17 Jan 2025
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
Diego A. Velázquez
Pau Rodríguez López
Sergio Alonso
Josep M. Gonfaus
Jordi Gonzalez
Gerardo Richarte
Javier Marin
Yoshua Bengio
Alexandre Lacoste
44
0
0
14 Jan 2025
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
Zixuan Chen
Jing Huo
Yangtao Chen
Yang Gao
43
2
0
11 Jan 2025
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Yuwei Fang
Kwot Sin Lee
Ivan Skorokhodov
Kfir Aberman
Jun-Yan Zhu
Ming Yang
Sergey Tulyakov
DiffM
VGen
69
7
0
10 Jan 2025
Edit as You See: Image-guided Video Editing via Masked Motion Modeling
Zhi-Lin Huang
Y. Liu
Chujun Qin
Z. Wang
Dong Zhou
Dong Li
E. Barsoum
DiffM
VGen
41
0
0
08 Jan 2025
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets
Tanay Agrawal
Mohammed Guermal
Michal Balazia
François Brémond
21
0
0
08 Jan 2025
MVP: Multimodal Emotion Recognition based on Video and Physiological Signals
Valeriya Strizhkova
Hadi Kachmar
Hava Chaptoukaev
Raphael Kalandadze
Natia Kukhilava
...
Maria A. Zuluaga
Michal Balazia
A. Dantcheva
François Brémond
Laura M. Ferrari
30
0
0
06 Jan 2025
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
Junmyeong Lee
Eui Jun Hwang
Sukmin Cho
Jong C. Park
27
0
0
06 Jan 2025
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Rui Liu
Hongyu Yuan
H. Li
38
0
0
03 Jan 2025
Previous
1
2
3
4
5
...
13
14
15
Next