Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.08254
Cited By
BEiT: BERT Pre-Training of Image Transformers
15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BEiT: BERT Pre-Training of Image Transformers"
38 / 1,788 papers shown
Title
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
19
709
0
15 Nov 2021
Attention Mechanisms in Computer Vision: A Survey
Meng-Hao Guo
Tianhan Xu
Jiangjiang Liu
Zheng-Ning Liu
Peng-Tao Jiang
Tai-Jiang Mu
Song-Hai Zhang
Ralph Robert Martin
Ming-Ming Cheng
Shimin Hu
19
1,633
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
302
7,434
0
11 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
69
330
0
11 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Yinghui Li
Li Tao
Dun Liang
Haitao Zheng
79
96
0
07 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
27
368
0
03 Nov 2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLM
MLLM
MoE
20
532
0
03 Nov 2021
GenURL: A General Framework for Unsupervised Representation Learning
Siyuan Li
Zicheng Liu
Z. Zang
Di Wu
Zhiyuan Chen
Stan Z. Li
OOD
3DGS
OffRL
26
9
0
27 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
30
268
0
19 Oct 2021
Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
Te-Lin Wu
Alexander Spangher
Pegah Alipoormolabashi
Marjorie Freedman
R. Weischedel
Nanyun Peng
13
20
0
16 Oct 2021
Self-Supervised Learning by Estimating Twin Class Distributions
Feng Wang
Tao Kong
Rufeng Zhang
Huaping Liu
Hang Li
SSL
50
16
0
14 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
112
192
0
14 Oct 2021
Rethinking Supervised Pre-training for Better Downstream Transferring
Yutong Feng
Jianwen Jiang
Mingqian Tang
R. L. Jin
Yue Gao
SSL
43
39
0
12 Oct 2021
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Benyou Wang
Qianqian Xie
Jiahuan Pei
Zhihong Chen
Prayag Tiwari
Zhao Li
Jie Fu
LM&MA
AI4CE
37
163
0
11 Oct 2021
Vector-quantized Image Modeling with Improved VQGAN
Jiahui Yu
Xin Li
Jing Yu Koh
Han Zhang
Ruoming Pang
James Qin
Alexander Ku
Yuanzhong Xu
Jason Baldridge
Yonghui Wu
ViT
VLM
DRL
49
476
0
09 Oct 2021
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li
Tengchao Lv
Jingye Chen
Lei Cui
Yijuan Lu
D. Florêncio
Cha Zhang
Zhoujun Li
Furu Wei
ViT
93
340
0
21 Sep 2021
Memory Based Video Scene Parsing
Zhenchao Jin
Dongdong Yu
Kai Su
Zehuan Yuan
Changhu Wang
VLM
19
3
0
01 Sep 2021
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
Cam-Ngoan Nguyen
Zuhayr Asad
Yuankai Huo
ViT
MedIm
19
35
0
26 Aug 2021
When Do Contrastive Learning Signals Help Spatio-Temporal Graph Forecasting?
Xu Liu
Yuxuan Liang
Chao Huang
Yu Zheng
Bryan Hooi
Roger Zimmermann
AI4TS
13
60
0
26 Aug 2021
How Self-Supervised Learning Can be Used for Fine-Grained Head Pose Estimation?
Mahdi Pourmirzaei
Farzaneh Esmaili
G. Montazer
Sasan Karamizadeh
Seyedehsamaneh Shojaeilangari
19
0
0
10 Aug 2021
On The State of Data In Computer Vision: Human Annotations Remain Indispensable for Developing Deep Learning Models
Z. Emam
Andrew Kondrich
Sasha Harrison
Felix Lau
Yushi Wang
Aerin Kim
E. Branson
VLM
20
11
0
31 Jul 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
23
3,828
0
28 Jul 2021
VisDA-2021 Competition Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data
D. Bashkirova
Dan Hendrycks
Donghyun Kim
Samarth Mishra
Kate Saenko
Kuniaki Saito
Piotr Teterwak
Ben Usman
OOD
10
19
0
23 Jul 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
22
5,044
0
07 Jul 2021
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
41
313
0
24 Jun 2021
Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
SSL
19
3
0
01 Jun 2021
ResMLP: Feedforward networks for image classification with data-efficient training
Hugo Touvron
Piotr Bojanowski
Mathilde Caron
Matthieu Cord
Alaaeldin El-Nouby
...
Gautier Izacard
Armand Joulin
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
VLM
16
655
0
07 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
303
5,773
0
29 Apr 2021
Do We Really Need Dice? The Hidden Region-Size Biases of Segmentation Losses
Bingyuan Liu
Jose Dolz
Adrian Galdran
Riadh Kobbi
Ismail Ben Ayed
21
15
0
18 Apr 2021
MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
Zewen Chi
Li Dong
Shuming Ma
Shaohan Huang Xian-Ling Mao
Heyan Huang
Furu Wei
LRM
45
71
0
18 Apr 2021
SiT: Self-supervised vIsion Transformer
Sara Atito Ali Ahmed
Muhammad Awais
J. Kittler
ViT
31
139
0
08 Apr 2021
Creativity and Machine Learning: A Survey
Giorgio Franceschelli
Mirco Musolesi
VLM
AI4CE
19
40
0
06 Apr 2021
UNETR: Transformers for 3D Medical Image Segmentation
Ali Hatamizadeh
Yucheng Tang
Vishwesh Nath
Dong Yang
Andriy Myronenko
Bennett Landman
H. Roth
Daguang Xu
ViT
MedIm
39
1,533
0
18 Mar 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,774
0
24 Feb 2021
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
18
2,128
0
23 Dec 2020
Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation
S. Honari
Victor Constantin
Helge Rhodin
Mathieu Salzmann
Pascal Fua
3DH
26
10
0
02 Dec 2020
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
252
3,369
0
09 Mar 2020
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,827
0
18 Aug 2016
Previous
1
2
3
...
34
35
36