Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
Advanced Gesture Recognition in Autism: Integrating YOLOv7, Video Augmentation and VideoMAE for Video Analysis
Amit Kumar Singh
Trapti Shrivastava
Vrijendra Singh
16
0
0
12 Oct 2024
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
Sehun Kim
18
1
0
11 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
43
5
0
10 Oct 2024
The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024
Yinan Han
Qingyuan Jiang
Hongming Mei
Yang Yang
Jinhui Tang
17
0
0
08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
30
14
0
08 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Mohit Bansal
Koustuv Sinha
AI4TS
49
3
0
04 Oct 2024
AirLetters: An Open Video Dataset of Characters Drawn in the Air
Rishit Dagli
Guillaume Berger
Joanna Materzynska
Ingo Bax
Roland Memisevic
VGen
14
1
0
03 Oct 2024
An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos
Arun V. Reddy
Ketul Shah
Corban Rivera
William Paul
Celso M. De Melo
Rama Chellappa
SLR
16
0
0
03 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
23
0
0
02 Oct 2024
Pre-training with Synthetic Patterns for Audio
Yuchi Ishikawa
Tatsuya Komatsu
Yoshimitsu Aoki
18
0
0
01 Oct 2024
TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids
Mazen Balat
Mahmoud Essam Gabr
Hend Bakr
A. Zaky
16
1
0
01 Oct 2024
Loose Social-Interaction Recognition in Real-world Therapy Scenarios
Abid Ali
Rui Dai
Ashish Marisetty
Guillaume Astruc
Monique Thonnat
J. Odobez
Susanne Thümmler
Francois Bremond
29
1
0
30 Sep 2024
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis
Nishq Poorav Desai
Ali Etemad
Michael A. Greenspan
23
0
0
30 Sep 2024
Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024
Haowei Gu
Weihao Zhu
Yang Yang
20
0
0
29 Sep 2024
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis
Shukesh Reddy
Nishit Poddar
Srijan Das
Abhijit Das
CVBM
20
0
0
29 Sep 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
28
6
0
27 Sep 2024
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks?
Jose Sosa
Mohamed Aloulou
Danila Rukhovich
Rim Sleimi
Boonyarit Changaival
Anis Kacem
Djamila Aouada
25
0
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
22
0
0
26 Sep 2024
Interpretable Action Recognition on Hard to Classify Actions
Anastasia Anichenko
Frank Guerin
Andrew Gilbert
16
0
0
19 Sep 2024
Across-Game Engagement Modelling via Few-Shot Learning
Kosmas Pinitas
Konstantinos Makantasis
Georgios N. Yannakakis
24
1
0
19 Sep 2024
Self-Supervised Pre-training Tasks for an fMRI Time-series Transformer in Autism Detection
Yinchi Zhou
Peiyu Duan
Yuexi Du
Nicha Dvornek
MedIm
13
1
0
18 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
23
0
0
17 Sep 2024
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu
Lilang Lin
Jiahang Zhang
Y. Ma
Jiaying Liu
DiffM
46
0
0
16 Sep 2024
Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better
Mengying Ge
Mingyang Li
Dongkai Tang
Pengbo Li
Kuo Liu
Shuhao Deng
Songbai Pu
L. Liu
Yang Song
Tao Zhang
23
0
0
12 Sep 2024
Data Collection-free Masked Video Modeling
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
19
1
0
10 Sep 2024
UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity
Yicheng Fu
R. Anantha
Prabal Vashisht
Jianpeng Cheng
Etai Littwin
26
2
0
06 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
70
15
0
05 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
14
0
0
02 Sep 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
Y. Guo
Faizan Siddiqui
Yang Zhao
Rama Chellappa
Shao-Yuan Lo
LRM
24
2
0
31 Aug 2024
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
Zhiyuan Yan
Yandan Zhao
Shen Chen
Xinghe Fu
Taiping Yao
Shouhong Ding
Li Yuan
30
8
0
30 Aug 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
30
1
0
30 Aug 2024
Online pre-training with long-form videos
Itsuki Kato
Kodai Kamiya
Toru Tamaki
OnRL
24
0
0
28 Aug 2024
Fine-grained length controllable video captioning with ordinal embeddings
Tomoya Nitta
Takumi Fukuzawa
Toru Tamaki
25
0
0
27 Aug 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu-Gang Jiang
VGen
DiffM
30
5
0
27 Aug 2024
MMASD+: A Novel Dataset for Privacy-Preserving Behavior Analysis of Children with Autism Spectrum Disorder
Pavan Uttej Ravva
Behdokht Kiafar
Pinar Kullu
Jicheng Li
Anjana Bhat
R. Barmaki
29
0
0
27 Aug 2024
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Wentao Wu
Fanghua Hong
Xiao Wang
Chenglong Li
Jin Tang
VLM
41
1
0
23 Aug 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?
Chen Liang
Qiang Guo
Xiaochao Qu
Luoqi Liu
Ting Liu
VOS
32
0
0
20 Aug 2024
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
Zebang Cheng
Shuyuan Tu
Dawei Huang
Minghan Li
Xiaojiang Peng
Zhi-Qi Cheng
Alexander G. Hauptmann
43
2
0
20 Aug 2024
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang
Christopher Hoang
Yuwen Xiong
Yann LeCun
Mengye Ren
64
0
0
20 Aug 2024
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs
Eui Jun Hwang
Sukmin Cho
Junmyeong Lee
Jong C. Park
SLR
59
4
0
20 Aug 2024
VrdONE: One-stage Video Visual Relation Detection
Xinjie Jiang
Chenxi Zheng
Xuemiao Xu
Bangzhen Liu
Weiying Zheng
Huaidong Zhang
Shengfeng He
VGen
VOS
37
3
0
18 Aug 2024
Flatten: Video Action Recognition is an Image Classification task
Junlin Chen
Chengcheng Xu
Yangfan Xu
Jian Yang
Jun Yu Li
Zhiping Shi
18
1
0
17 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
27
1
0
13 Aug 2024
Membership Inference Attack Against Masked Image Modeling
Z. Li
Xinlei He
Ning Yu
Yang Zhang
38
1
0
13 Aug 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
59
6
0
13 Aug 2024
Deep Multimodal Collaborative Learning for Polyp Re-Identification
Suncheng Xiang
Jincheng Li
Zhengjie Zhang
Shilun Cai
Jiale Guan
Dahong Qian
20
0
0
12 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Rex Liu
Xin Liu
18
1
0
08 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
37
0
0
07 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
Xiaofeng Mao
Zhengkai Jiang
Qilin Wang
Chencan Fu
Jiangning Zhang
Jiafu Wu
Yabiao Wang
Chengjie Wang
Wei Li
Mingmin Chi
70
4
0
06 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
Xin Liu
Chao Hao
Zitong Yu
Huanjing Yue
Jingyu Yang
23
1
0
05 Aug 2024
Previous
1
2
3
4
5
...
13
14
15
Next