ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08254
  4. Cited By
BEiT: BERT Pre-Training of Image Transformers

BEiT: BERT Pre-Training of Image Transformers

15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
    ViT
ArXivPDFHTML

Papers citing "BEiT: BERT Pre-Training of Image Transformers"

50 / 1,788 papers shown
Title
Global Contrast Masked Autoencoders Are Powerful Pathological
  Representation Learners
Global Contrast Masked Autoencoders Are Powerful Pathological Representation Learners
Hao Quan
Xingyu Li
Weixing Chen
Qun Bai
Mingchen Zou
Ruijie Yang
Tingting Zheng
R. Qi
Xin Gao
Xiaoyu Cui
MedIm
28
19
0
18 May 2022
Label-Efficient Self-Supervised Federated Learning for Tackling Data
  Heterogeneity in Medical Imaging
Label-Efficient Self-Supervised Federated Learning for Tackling Data Heterogeneity in Medical Imaging
Rui Yan
Liangqiong Qu
Qingyue Wei
Shih-Cheng Huang
Liyue Shen
D. Rubin
Lei Xing
Yuyin Zhou
FedML
78
89
0
17 May 2022
Vision Transformer Adapter for Dense Predictions
Vision Transformer Adapter for Dense Predictions
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
43
541
0
17 May 2022
CONSENT: Context Sensitive Transformer for Bold Words Classification
CONSENT: Context Sensitive Transformer for Bold Words Classification
Ionut Sandu
Daniel Voinea
A. Popa
21
3
0
16 May 2022
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and
  Parallel Decoder
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
Yuchao Gu
Xintao Wang
Liangbin Xie
Chao Dong
Gengyan Li
Ying Shan
Mingg-Ming Cheng
24
115
0
13 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised
  Learning
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
27
34
0
12 May 2022
One Model, Multiple Modalities: A Sparsely Activated Approach for Text,
  Sound, Image, Video and Code
One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code
Yong Dai
Duyu Tang
Liangxin Liu
Minghuan Tan
Cong Zhou
Jingquan Wang
Zhangyin Feng
Fan Zhang
Xueyu Hu
Shuming Shi
VLM
MoE
25
26
0
12 May 2022
An Empirical Study Of Self-supervised Learning Approaches For Object
  Detection With Transformers
An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers
Gokul Karthik Kumar
Sahal Shaji Mullappilly
Abhishek Singh Gehlot
ViT
23
1
0
11 May 2022
Multiplexed Immunofluorescence Brain Image Analysis Using
  Self-Supervised Dual-Loss Adaptive Masked Autoencoder
Multiplexed Immunofluorescence Brain Image Analysis Using Self-Supervised Dual-Loss Adaptive Masked Autoencoder
S. Ly
Bai Lin
Hung Q. Vo
D. Maric
B. Roysam
H. V. Nguyen
26
0
0
10 May 2022
Domain Invariant Masked Autoencoders for Self-supervised Learning from
  Multi-domains
Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
Haiyang Yang
Meilin Chen
Yizhou Wang
Shixiang Tang
Feng Zhu
Lei Bai
Rui Zhao
Wanli Ouyang
19
16
0
10 May 2022
Activating More Pixels in Image Super-Resolution Transformer
Activating More Pixels in Image Super-Resolution Transformer
Xiangyu Chen
Xintao Wang
Jiantao Zhou
Yu Qiao
Chao Dong
ViT
59
600
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
MINI: Mining Implicit Novel Instances for Few-Shot Object Detection
Yuhang Cao
Jiaqi Wang
Yiqi Lin
Dahua Lin
ObjD
25
5
0
06 May 2022
Lifelong Ensemble Learning based on Multiple Representations for
  Few-Shot Object Recognition
Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition
H. Kasaei
Songsong Xiong
14
12
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
61
1,255
0
04 May 2022
GPUNet: Searching the Deployable Convolution Neural Networks for GPUs
GPUNet: Searching the Deployable Convolution Neural Networks for GPUs
Linnan Wang
Chenhan D. Yu
Satish Salian
Slawomir Kierat
Szymon Migacz
A. Fit-Florea
12
11
0
26 Apr 2022
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
26
512
0
26 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
13
43
0
26 Apr 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning
  General-purpose Audio Representation
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
29
65
0
26 Apr 2022
Masked Image Modeling Advances 3D Medical Image Analysis
Masked Image Modeling Advances 3D Medical Image Analysis
Zekai Chen
Devansh Agarwal
Kshitij Aggarwal
Wiem Safta
Samit Hirawat
V. Sethuraman
Mariann Micsinai Balan
Kevin Brown
18
69
0
25 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
23
22
0
22 Apr 2022
A Masked Image Reconstruction Network for Document-level Relation
  Extraction
A Masked Image Reconstruction Network for Document-level Relation Extraction
L. Zhang
Yidong Cheng
19
2
0
21 Apr 2022
Transformer Decoders with MultiModal Regularization for Cross-Modal Food
  Retrieval
Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
Mustafa Shukor
Guillaume Couairon
Asya Grechka
Matthieu Cord
ViT
27
18
0
20 Apr 2022
Residual Mixture of Experts
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
22
36
0
20 Apr 2022
Neuro-BERT: Rethinking Masked Autoencoding for Self-supervised
  Neurological Pretraining
Neuro-BERT: Rethinking Masked Autoencoding for Self-supervised Neurological Pretraining
Di Wu
Siyuan Li
Jie Yang
Mohamad Sawan
SSL
28
14
0
20 Apr 2022
On the Representation Collapse of Sparse Mixture of Experts
On the Representation Collapse of Sparse Mixture of Experts
Zewen Chi
Li Dong
Shaohan Huang
Damai Dai
Shuming Ma
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
MoMe
MoE
37
96
0
20 Apr 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
25
432
0
18 Apr 2022
Visio-Linguistic Brain Encoding
Visio-Linguistic Brain Encoding
S. Oota
Jashn Arora
Vijay Rowtula
Manish Gupta
R. Bapi
AI4CE
14
15
0
18 Apr 2022
The Devil is in the Frequency: Geminated Gestalt Autoencoder for
  Self-Supervised Visual Pre-Training
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
Hao Liu
Xinghua Jiang
Xin Li
Antai Guo
Deqiang Jiang
Bo Ren
24
36
0
18 Apr 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
  Cross-Modal Retrieval
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
19
54
0
15 Apr 2022
Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
  Data and Fine-Tuning Make a Difference
Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference
S. Hu
Da Li
Jan Stuhmer
Minyoung Kim
Timothy M. Hospedales
19
188
0
15 Apr 2022
Masked Siamese Networks for Label-Efficient Learning
Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran
Mathilde Caron
Ishan Misra
Piotr Bojanowski
Florian Bordes
Pascal Vincent
Armand Joulin
Michael G. Rabbat
Nicolas Ballas
SSL
28
311
0
14 Apr 2022
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
42
388
0
14 Apr 2022
3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of
  Transformer-MLP Paradigm for Dense Prediction in Medical Volume
3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume
Jianye Pang
Cheng Jiang
Yihao Chen
Jianbo Chang
M. Feng
Renzhi Wang
Jianhua Yao
ViT
MedIm
28
11
0
14 Apr 2022
Evaluating Vision Transformer Methods for Deep Reinforcement Learning
  from Pixels
Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels
Tianxin Tao
Daniele Reda
M. van de Panne
ViT
11
19
0
11 Apr 2022
Representation Learning by Detecting Incorrect Location Embeddings
Representation Learning by Detecting Incorrect Location Embeddings
Sepehr Sameni
Simon Jenni
Paolo Favaro
ViT
29
4
0
10 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for
  Object Detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
23
55
0
06 Apr 2022
Region Rebalance for Long-Tailed Semantic Segmentation
Region Rebalance for Long-Tailed Semantic Segmentation
Jiequan Cui
Yuhui Yuan
Zhisheng Zhong
Zhuotao Tian
Han Hu
Stephen Lin
Jiaya Jia
18
18
0
05 Apr 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders
MultiMAE: Multi-modal Multi-task Masked Autoencoders
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
32
265
0
04 Apr 2022
BatchFormerV2: Exploring Sample Relationships for Dense Representation
  Learning
BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning
Zhi Hou
Baosheng Yu
Chaoyue Wang
Yibing Zhan
Dacheng Tao
ViT
20
11
0
04 Apr 2022
Improving Vision Transformers by Revisiting High-frequency Components
Improving Vision Transformers by Revisiting High-frequency Components
Jiawang Bai
Liuliang Yuan
Shutao Xia
Shuicheng Yan
Zhifeng Li
W. Liu
ViT
8
90
0
03 Apr 2022
POS-BERT: Point Cloud One-Stage BERT Pre-Training
POS-BERT: Point Cloud One-Stage BERT Pre-Training
Kexue Fu
Peng Gao
Shaolei Liu
Renrui Zhang
Yu Qiao
Manning Wang
3DPC
22
18
0
03 Apr 2022
UNetFormer: A Unified Vision Transformer Model and Pre-Training
  Framework for 3D Medical Image Segmentation
UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation
Ali Hatamizadeh
Ziyue Xu
Dong Yang
Wenqi Li
H. Roth
Daguang Xu
ViT
MedIm
31
29
0
01 Apr 2022
Self-distillation Augmented Masked Autoencoders for Histopathological
  Image Classification
Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification
Yang Luo
Zhineng Chen
Shengtian Zhou
Xieping Gao
25
1
0
31 Mar 2022
Exploring Plain Vision Transformer Backbones for Object Detection
Exploring Plain Vision Transformer Backbones for Object Detection
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
33
774
0
30 Mar 2022
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Xiaotong Li
Yixiao Ge
Kun Yi
Zixuan Hu
Ying Shan
Ling-yu Duan
37
38
0
29 Mar 2022
In-N-Out Generative Learning for Dense Unsupervised Video Segmentation
In-N-Out Generative Learning for Dense Unsupervised Video Segmentation
Xiaomiao Pan
Peike Li
Zongxin Yang
Huiling Zhou
Chang Zhou
Hongxia Yang
Jingren Zhou
Yi Yang
VOS
24
11
0
29 Mar 2022
Mugs: A Multi-Granular Self-Supervised Learning Framework
Mugs: A Multi-Granular Self-Supervised Learning Framework
Pan Zhou
Yichen Zhou
Chenyang Si
Weihao Yu
Teck Khim Ng
Shuicheng Yan
VLM
34
60
0
27 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
20
16
0
27 Mar 2022
Beyond Masking: Demystifying Token-Based Pre-Training for Vision
  Transformers
Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers
Yunjie Tian
Lingxi Xie
Jiemin Fang
Mengnan Shi
Junran Peng
Xiaopeng Zhang
Jianbin Jiao
Qi Tian
QiXiang Ye
23
19
0
27 Mar 2022
Previous
123...3233343536
Next