Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning
Mahsa Ehsanpour
Ian Reid
Hamid Rezatofighi
ViT
27
0
0
08 Apr 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
27
8
0
08 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
53
7
0
07 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
34
3
0
06 Apr 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
Morteza Moradi
Mohammad Moradi
Francesco Rundo
C. Spampinato
Ali Borji
S. Palazzo
33
1
0
03 Apr 2024
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad
Sergey Zakahrov
Vitor Campagnolo Guizilini
Adrien Gaidon
Z. Kira
Rares Ambrus
ViT
32
12
0
01 Apr 2024
360+x: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen
Yuqi Hou
Chenyuan Qu
Irene Testini
Xiaohan Hong
Jianbo Jiao
22
6
0
01 Apr 2024
HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs
Sunwoo Kim
Shinhwan Kang
Fanchen Bu
Soo Yong Lee
Jaemin Yoo
Kijung Shin
SSL
16
11
0
31 Mar 2024
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu
Chen Li
Haoran Tang
Yixiao Ge
Ying Shan
Ge Li
27
68
0
30 Mar 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
27
2
0
28 Mar 2024
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël
Renaud Vandeghen
A. Cioppa
Silvio Giancola
Bernard Ghanem
Marc Van Droogenbroeck
ViT
36
6
0
26 Mar 2024
Elysium: Exploring Object-level Perception in Videos via MLLM
Hang Wang
Yanjie Wang
Yongjie Ye
Yuxiang Nie
Can Huang
MLLM
32
18
0
25 Mar 2024
Adversarially Masked Video Consistency for Unsupervised Domain Adaptation
Xiaoyu Zhu
Junwei Liang
Po-Yao Huang
Alex Hauptmann
30
1
0
24 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
32
1
0
24 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
31
2
0
24 Mar 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
54
35
0
24 Mar 2024
Edit3K: Universal Representation Learning for Video Editing Components
Xin Gu
Libo Zhang
Fan Chen
Longyin Wen
Yufei Wang
Tiejian Luo
Sijie Zhu
35
4
0
24 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
27
44
0
22 Mar 2024
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul M. Chilimbi
VLM
AI4TS
43
4
0
21 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
40
12
0
20 Mar 2024
N-Modal Contrastive Losses with Applications to Social Media Data in Trimodal Space
William Theisen
Walter J. Scheirer
17
1
0
18 Mar 2024
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition
Abhi Kamboj
Minh Do
22
1
0
17 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
72
0
14 Mar 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
16
4
0
14 Mar 2024
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
36
2
0
13 Mar 2024
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu
Mayuna Gupta
Chetan Madan
Pankaj Gupta
Chetan Arora
28
4
0
13 Mar 2024
CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression
Xinjie Zhang
Shenyuan Gao
Zhening Liu
Jiawei Shao
Xingtong Ge
Dailan He
Tongda Xu
Yan Wang
Jun Zhang
33
1
0
13 Mar 2024
Spatiotemporal Representation Learning for Short and Long Medical Image Time Series
Chengzhi Shen
M. Menten
Hrvoje Bogunović
U. Schmidt-Erfurth
H. Scholl
S. Sivaprasad
A. Lotery
Daniel Rueckert
Paul Hager
Robbie Holland
16
2
0
12 Mar 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLM
VLM
28
1
0
11 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
30
179
0
11 Mar 2024
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models
Philip Harris
Michael Kagan
J. Krupa
B. Maier
Nathaniel Woodward
34
4
0
11 Mar 2024
Using Fiber Optic Bundles to Miniaturize Vision-Based Tactile Sensors
Julia Di
Zdravko Dugonjic
Will Fu
Tingfan Wu
Romeo Mercado
...
Richard E. Fan
G. Sonn
M. Cutkosky
Mike Lambeta
Roberto Calandra
16
9
0
08 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
29
10
0
08 Mar 2024
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
Yifang Xu
Yunzhuo Sun
Zien Xie
Benxiang Zhai
Sidan Du
43
6
0
04 Mar 2024
Data-efficient Event Camera Pre-training via Disentangled Masked Modeling
Zhenpeng Huang
Chao Li
Hao Chen
Yongjian Deng
Yifeng Geng
Limin Wang
35
2
0
01 Mar 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei
Tao Chen
XiRuo Jiang
Huafeng Liu
Zeren Sun
Yazhou Yao
VGen
32
9
0
29 Feb 2024
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
30
3
0
29 Feb 2024
SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation
Liangyu Xu
Wanxuan Lu
Hongfeng Yu
Fanglong Yao
Xian Sun
Kun Fu
24
5
0
28 Feb 2024
MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation
Hanan Gani
Muzammal Naseer
Fahad Khan
Salman Khan
19
0
0
27 Feb 2024
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang
Yueqian Wang
Pengfei Wu
Jianxin Liang
Dongyan Zhao
Zilong Zheng
VLM
21
9
0
25 Feb 2024
ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis
Yuchen He
Zeqing Yuan
Yihong Wu
Liqi Cheng
Dazhen Deng
Yingcai Wu
22
4
0
25 Feb 2024
Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning
Wuyang Chen
Jialin Song
Pu Ren
Shashank Subramanian
Dmitriy Morozov
Michael W. Mahoney
AI4CE
35
9
0
24 Feb 2024
Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving
Yichen Xie
Hongge Chen
Gregory P. Meyer
Yong Jae Lee
Eric M. Wolff
Masayoshi Tomizuka
Wei Zhan
Yuning Chai
Xin Huang
3DPC
27
1
0
23 Feb 2024
Attention-Guided Masked Autoencoders For Learning Image Representations
Leon Sick
Dominik Engel
Pedro Hermosilla
Timo Ropinski
32
1
0
23 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
41
23
0
20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
27
29
0
20 Feb 2024
VGMShield: Mitigating Misuse of Video Generative Models
Yan Pang
Yang Zhang
Tianhao Wang
32
3
0
20 Feb 2024
Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition
Yuke Li
Guangyi Chen
Ben Abramowitz
Stefano Anzellotti
Donglai Wei
TTA
38
1
0
20 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
82
73
0
15 Feb 2024
Previous
1
2
3
...
6
7
8
...
13
14
15
Next