Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
26
12
0
20 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
19
136
0
20 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Anindya Mondal
Sauradip Nag
J. Prada
Xiatian Zhu
Anjan Dutta
16
9
0
20 Jul 2023
Learning Discriminative Visual-Text Representation for Polyp Re-Identification
Suncheng Xiang
Can Liu
Sijia Du
Dahong Qian
32
1
0
20 Jul 2023
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Mohan S. Kankanhalli
24
0
0
19 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
28
3
0
17 Jul 2023
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
Hongfei Yan
Y. Liu
Yushen Wei
Z. Li
Guanbin Li
Liang Lin
21
40
0
17 Jul 2023
Masked Autoencoders for Unsupervised Anomaly Detection in Medical Images
Mariana-Iuliana Georgescu
MedIm
17
7
0
14 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
25
244
0
13 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
32
4
0
10 Jul 2023
SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks
Xingyu Lin
John So
Sashwat Mahalingam
Fangchen Liu
Pieter Abbeel
SSL
22
20
0
07 Jul 2023
It is not Sexually Suggestive, It is Educative. Separating Sex Education from Suggestive Content on TikTok Videos
Enfa George
Mihai Surdeanu
8
1
0
06 Jul 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Yin Cui
...
Florian Schroff
Hartwig Adam
Ming Yang
Ting Liu
Boqing Gong
ELM
32
9
0
06 Jul 2023
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition
Licai Sun
Zheng Lian
B. Liu
Jianhua Tao
11
17
0
05 Jul 2023
Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
Xiang Li
Varun Belagali
Jinghuan Shang
Michael S. Ryoo
32
28
0
04 Jul 2023
Human-to-Human Interaction Detection
Zhenhua Wang
Kaining Ying
Jiajun Meng
J. Ning
22
2
0
02 Jul 2023
SpotEM: Efficient Video Search for Episodic Memory
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
VLM
26
9
0
28 Jun 2023
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023
Zhijian Hou
Lei Ji
Difei Gao
Wanjun Zhong
Kun Yan
C. Li
W. Chan
Chong-Wah Ngo
Nan Duan
Mike Zheng Shou
11
15
0
27 Jun 2023
MAE-GEBD:Winning the CVPR'2023 LOVEU-GEBD Challenge
Yuanxi Sun
Ruifei He
Youzeng Li
Zuwei Huang
Feng Hu
Xu Cheng
Jie Tang
11
1
0
27 Jun 2023
Variance-Covariance Regularization Improves Representation Learning
Jiachen Zhu
Katrina Evtimova
Yubei Chen
Ravid Shwartz-Ziv
Yann LeCun
SSL
18
7
0
23 Jun 2023
FuXi: A cascade machine learning forecasting system for 15-day global weather forecast
Lei Chen
Xiaohui Zhong
Feng-jun Zhang
Yuan-Chia Cheng
Yinghui Xu
Yuan Qi
Hao Li
AI4Cl
20
197
0
22 Jun 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
30
14
0
20 Jun 2023
Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023
Jiayi Shao
Xiaohan Wang
Ruijie Quan
Yezhou Yang
EgoV
19
8
0
15 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Y. S. Rawat
SSL
16
3
0
09 Jun 2023
FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow
Zhaoyang Huang
Xiaoyu Shi
Chao Zhang
Qiang Wang
Yijin Li
Hongwei Qin
Jifeng Dai
Xiaogang Wang
Hongsheng Li
22
4
0
08 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Shreyank N. Gowda
Anurag Arnab
Jonathan Huang
ViT
16
4
0
07 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
25
21
0
06 Jun 2023
VR.net: A Real-world Dataset for Virtual Reality Motion Sickness Research
Elliott Wen
Chitralekha Gupta
P. Sasikumar
Mark Billinghurst
James P Wilmott
Emily Skow
Arindam Dey
Suranga Nanayakkara
11
11
0
06 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
37
1
0
04 Jun 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability
Xiang Wang
Hangjie Yuan
Shiwei Zhang
Dayou Chen
Jiuniu Wang
Yingya Zhang
Yujun Shen
Deli Zhao
Jingren Zhou
VGen
DiffM
25
315
0
03 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
21
0
0
02 Jun 2023
Unifying (Machine) Vision via Counterfactual World Modeling
Daniel M. Bear
Kevin T. Feigelis
Honglin Chen
Wanhee Lee
R. Venkatesh
Klemen Kotar
Alex Durango
Daniel L. K. Yamins
VGen
23
12
0
02 Jun 2023
HomE: Homography-Equivariant Video Representation Learning
Anirudh Sriram
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
L. Fei-Fei
Ehsan Adeli
SSL
AI4TS
18
2
0
02 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
41
158
0
01 Jun 2023
On Masked Pre-training and the Marginal Likelihood
Pablo Moreno-Muñoz
Pol G. Recasens
Søren Hauberg
SSL
25
5
0
01 Jun 2023
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
Robert-Jan Bruintjes
A. Lengyel
Marcos Baptista-Rios
O. Kayhan
Davide Zambrano
Nergis Tomen
J. C. V. Gemert
10
9
0
31 May 2023
Benchmarking Diverse-Modal Entity Linking with Generative Models
Sijia Wang
A. Li
He Zhu
Shenmin Zhang
Chung-Wei Hang
...
William Wang
Zhiguo Wang
Vittorio Castelli
Bing Xiang
Patrick K. L. Ng
VLM
33
8
0
27 May 2023
Action Sensitivity Learning for Temporal Action Localization
Jiayi Shao
Xiaohan Wang
Ruijie Quan
Junjun Zheng
Jiang Yang
Yezhou Yang
21
22
0
25 May 2023
Siamese Masked Autoencoders
Agrim Gupta
Jiajun Wu
Jia Deng
Li Fei-Fei
20
48
0
23 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
24
9
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
92
76
0
22 May 2023
Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning
Xiaoxiao Sheng
Zhiqiang Shen
Gang Xiao
3DPC
SSL
28
6
0
22 May 2023
Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition
Nana Li
M. Bennis
Alexandros Iosifidis
Qi Zhang
11
3
0
22 May 2023
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity
Zijiao Chen
Jiaxin Qing
J. Zhou
DiffM
VGen
18
54
0
19 May 2023
SurgMAE: Masked Autoencoders for Long Surgical Video Analysis
Muhammad Abdullah Jamal
Omid Mohareri
13
5
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
114
0
18 May 2023
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot
Aanisha Bhattacharya
Yaman Kumar Singla
Balaji Krishnamurthy
R. Shah
Changyou Chen
VGen
19
11
0
16 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
24
2
0
13 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
19
45
0
10 May 2023
VideoChat: Chat-Centric Video Understanding
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
35
526
0
10 May 2023
Previous
1
2
3
...
10
11
12
13
14
15
Next