Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.09886
Cited By
SimMIM: A Simple Framework for Masked Image Modeling
18 November 2021
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Qi Dai
Han Hu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SimMIM: A Simple Framework for Masked Image Modeling"
23 / 173 papers shown
Title
V
2
^2
2
L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval
Wenhao Wang
Yifan Sun
Zongxin Yang
Yi Yang
VLM
10
3
0
26 Jul 2022
Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds
Georg Hess
Johan Jaxing
Elias Svensson
David Hagerman
Christoffer Petersson
Lennart Svensson
3DPC
ViT
16
33
0
01 Jul 2022
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision
Sanat Ramesh
V. Srivastav
Deepak Alapatt
Tong Yu
Aditya Murali
...
Saurav Sharma
A. Fleurentin
Georgios Exarchakis
Alexandros Karargyris
N. Padoy
16
42
0
01 Jul 2022
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qingzhen Tian
X. Bai
SSL
27
55
0
01 Jul 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
28
32
0
19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
25
95
0
16 Jun 2022
GMML is All you Need
Sara Atito
Muhammad Awais
J. Kittler
ViT
VLM
34
18
0
30 May 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
50
26
0
30 May 2022
Object-wise Masked Autoencoders for Fast Pre-training
Jiantao Wu
Shentong Mo
ViT
OCL
17
15
0
28 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
13
68
0
26 May 2022
Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm
Yuhan Xie
Kexin Jiang
Zhiyong Zhang
Shaolong Chen
Xiaodong Zhang
Changzhen Qiu
13
1
0
07 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
42
1,249
0
04 May 2022
Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran
Mathilde Caron
Ishan Misra
Piotr Bojanowski
Florian Bordes
Pascal Vincent
Armand Joulin
Michael G. Rabbat
Nicolas Ballas
SSL
11
311
0
14 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
6
54
0
06 Apr 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
37
1,114
0
23 Mar 2022
Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion
Zhiqiang Yan
Xiang Li
Kun Wang
Zhenyu Zhang
Jun Yu Li
Jian Yang
MDE
22
32
0
18 Mar 2022
MVP: Multimodality-guided Visual Pre-training
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
11
104
0
10 Mar 2022
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
33
237
0
24 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
41
1,713
0
18 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
Yin Cui
A. Srinivas
Rui Qian
Tsung-Yi Lin
E. D. Cubuk
Quoc V. Le
Barret Zoph
ISeg
223
962
0
13 Dec 2020
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
249
1,817
0
18 Aug 2016
Previous
1
2
3
4