Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.11250
Cited By
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
21 June 2021
Hao Tan
Jie Lei
Thomas Wolf
Mohit Bansal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning"
14 / 14 papers shown
Title
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning
Mahsa Ehsanpour
Ian Reid
Hamid Rezatofighi
ViT
27
0
0
08 Apr 2024
M-BEV: Masked BEV Perception for Robust Autonomous Driving
Siran Chen
Yue Ma
Yu Qiao
Yali Wang
19
8
0
19 Dec 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
22
29
0
21 Aug 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
ViT
24
143
0
06 Feb 2023
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Cheng Lu
Xiaojie Jin
Zhicheng Huang
Qibin Hou
Mingg-Ming Cheng
Jiashi Feng
22
8
0
15 Jan 2023
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
36
70
0
30 Jul 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
17
95
0
16 Jun 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
25
651
0
16 Dec 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
215
0
24 Nov 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
237
482
0
20 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
235
3,029
0
09 Mar 2020
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
1