ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.07832
  4. Cited By
iBOT: Image BERT Pre-Training with Online Tokenizer
v1v2v3 (latest)

iBOT: Image BERT Pre-Training with Online Tokenizer

15 November 2021
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
ArXiv (abs)PDFHTML

Papers citing "iBOT: Image BERT Pre-Training with Online Tokenizer"

50 / 607 papers shown
Masked Siamese ConvNets
Masked Siamese ConvNets
L. Jing
Jiachen Zhu
Yann LeCun
SSL
212
37
0
15 Jun 2022
A Simple Data Mixing Prior for Improving Self-Supervised Learning
A Simple Data Mixing Prior for Improving Self-Supervised LearningComputer Vision and Pattern Recognition (CVPR), 2022
Sucheng Ren
Huiyu Wang
Zhengqi Gao
Shengfeng He
Alan Yuille
Yuyin Zhou
Cihang Xie
183
42
0
15 Jun 2022
Rethinking Generalization in Few-Shot Classification
Rethinking Generalization in Few-Shot ClassificationNeural Information Processing Systems (NeurIPS), 2022
Markus Hiller
Rongkai Ma
Mehrtash Harandi
Tom Drummond
OCLVLM
358
84
0
15 Jun 2022
SERE: Exploring Feature Self-relation for Self-supervised Transformer
SERE: Exploring Feature Self-relation for Self-supervised TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zhong-Yu Li
Shanghua Gao
Ming-Ming Cheng
ViTMDE
253
19
0
10 Jun 2022
Extreme Masking for Learning Instance and Distributed Visual
  Representations
Extreme Masking for Learning Instance and Distributed Visual Representations
Zhirong Wu
Zihang Lai
Xiao Sun
Stephen Lin
297
24
0
09 Jun 2022
Spatial Entropy as an Inductive Bias for Vision Transformers
Spatial Entropy as an Inductive Bias for Vision TransformersMachine-mediated learning (ML), 2022
E. Peruzzo
E. Sangineto
Yahui Liu
Marco De Nadai
Wei Bi
Bruno Lepri
Andrii Zadaianchuk
ViTMDE
286
7
0
09 Jun 2022
Can CNNs Be More Robust Than Transformers?
Can CNNs Be More Robust Than Transformers?International Conference on Learning Representations (ICLR), 2022
Zeyu Wang
Yutong Bai
Yuyin Zhou
Cihang Xie
UQCVOOD
248
54
0
07 Jun 2022
On the duality between contrastive and non-contrastive self-supervised
  learning
On the duality between contrastive and non-contrastive self-supervised learningInternational Conference on Learning Representations (ICLR), 2022
Q. Garrido
Yubei Chen
Adrien Bardes
Laurent Najman
Yann LeCun
SSL
307
112
0
03 Jun 2022
Siamese Image Modeling for Self-Supervised Vision Representation
  Learning
Siamese Image Modeling for Self-Supervised Vision Representation LearningComputer Vision and Pattern Recognition (CVPR), 2022
Chenxin Tao
Xizhou Zhu
Weijie Su
Gao Huang
Bin Li
Jie Zhou
Yu Qiao
Xiaogang Wang
Jifeng Dai
SSL
301
107
0
02 Jun 2022
Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on
  Small Datasets
Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small DatasetsBrazilian Conference on Intelligent Systems (BRACIS), 2022
Leandro M. de Lima
R. Krohling
ViTMedIm
148
14
0
30 May 2022
Self-Supervised Visual Representation Learning with Semantic Grouping
Self-Supervised Visual Representation Learning with Semantic GroupingNeural Information Processing Systems (NeurIPS), 2022
Xin Wen
Bingchen Zhao
Anlin Zheng
Xinming Zhang
Xiaojuan Qi
SSL
417
85
0
30 May 2022
GMML is All you Need
GMML is All you NeedInternational Conference on Information Photonics (ICIP), 2022
Sara Atito
Muhammad Awais
J. Kittler
ViTVLM
198
20
0
30 May 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
A Closer Look at Self-Supervised Lightweight Vision TransformersInternational Conference on Machine Learning (ICML), 2022
Shaoru Wang
Jin Gao
Zeming Li
Jian Sun
Weiming Hu
ViT
286
51
0
28 May 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
  Pre-training
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-trainingNeural Information Processing Systems (NeurIPS), 2022
Renrui Zhang
Ziyu Guo
Rongyao Fang
Bingyan Zhao
Dong Wang
Yu Qiao
Jiaming Song
Shiyang Feng
3DPC
884
349
0
28 May 2022
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNNInternational Conference on Machine Learning (ICML), 2022
Siyuan Li
Di Wu
Fang Wu
Lei Shang
Stan.Z.Li
228
60
0
27 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual
  Recognition
AdaptFormer: Adapting Vision Transformers for Scalable Visual RecognitionNeural Information Processing Systems (NeurIPS), 2022
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
626
936
0
26 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Green Hierarchical Vision Transformer for Masked Image ModelingNeural Information Processing Systems (NeurIPS), 2022
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
291
83
0
26 May 2022
HIRL: A General Framework for Hierarchical Image Representation Learning
HIRL: A General Framework for Hierarchical Image Representation Learning
Minghao Xu
Yuanfan Guo
Xuanyu Zhu
Jiawen Li
Zhenbang Sun
Jiangtao Tang
Yi Xu
Bingbing Ni
SSL
153
3
0
26 May 2022
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of
  Hierarchical Vision Transformers
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Jihao Liu
Xin Huang
Jinliang Zheng
Yu Liu
Jiaming Song
303
80
0
26 May 2022
Decoder Denoising Pretraining for Semantic Segmentation
Decoder Denoising Pretraining for Semantic Segmentation
Emmanuel B. Asiedu
Simon Kornblith
Ting Chen
Niki Parmar
Matthias Minderer
Mohammad Norouzi
AI4CE
487
28
0
23 May 2022
A Study on Transformer Configuration and Training Objective
A Study on Transformer Configuration and Training ObjectiveInternational Conference on Machine Learning (ICML), 2022
Fuzhao Xue
Jianghai Chen
Aixin Sun
Xiaozhe Ren
Zangwei Zheng
Xiaoxin He
Yongming Chen
Xin Jiang
Yang You
208
10
0
21 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision
  Transformers with Locality
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
305
86
0
20 May 2022
Masked Image Modeling with Denoising Contrast
Masked Image Modeling with Denoising ContrastInternational Conference on Learning Representations (ICLR), 2022
Kun Yi
Yixiao Ge
Xiaotong Li
Shusheng Yang
Dian Li
Jianping Wu
Ying Shan
Xiaohu Qie
VLM
209
65
0
19 May 2022
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual
  Object Detection
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object DetectionIEEE International Conference on Computer Vision (ICCV), 2022
Yifan Zhang
Xiaosong Zhang
Zhiliang Peng
Zonghao Guo
Fang Wan
Xian-Wei Ji
QiXiang Ye
ObjD
228
28
0
19 May 2022
Multiplexed Immunofluorescence Brain Image Analysis Using
  Self-Supervised Dual-Loss Adaptive Masked Autoencoder
Multiplexed Immunofluorescence Brain Image Analysis Using Self-Supervised Dual-Loss Adaptive Masked Autoencoder
S. Ly
Bai Lin
Hung Q. Vo
D. Maric
B. Roysam
H. V. Nguyen
206
0
0
10 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Shiyang Feng
Teli Ma
Jiaming Song
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
256
151
0
08 May 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
162
48
0
26 Apr 2022
A Masked Image Reconstruction Network for Document-level Relation
  Extraction
A Masked Image Reconstruction Network for Document-level Relation Extraction
Li Zhang
Yidong Cheng
135
2
0
21 Apr 2022
The Devil is in the Frequency: Geminated Gestalt Autoencoder for
  Self-Supervised Visual Pre-Training
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-TrainingAAAI Conference on Artificial Intelligence (AAAI), 2022
Hao Liu
Xinghua Jiang
Xin Li
Antai Guo
Deqiang Jiang
Bo Ren
188
43
0
18 Apr 2022
Masked Siamese Networks for Label-Efficient Learning
Masked Siamese Networks for Label-Efficient LearningEuropean Conference on Computer Vision (ECCV), 2022
Mahmoud Assran
Mathilde Caron
Ishan Misra
Piotr Bojanowski
Florian Bordes
Pascal Vincent
Armand Joulin
Michael G. Rabbat
Nicolas Ballas
SSL
330
380
0
14 Apr 2022
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViTEuropean Conference on Computer Vision (ECCV), 2022
Hugo Touvron
Matthieu Cord
Edouard Grave
ViT
287
545
0
14 Apr 2022
Evaluating Vision Transformer Methods for Deep Reinforcement Learning
  from Pixels
Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels
Tianxin Tao
Daniele Reda
M. van de Panne
ViT
209
20
0
11 Apr 2022
Representation Learning by Detecting Incorrect Location Embeddings
Representation Learning by Detecting Incorrect Location EmbeddingsAAAI Conference on Artificial Intelligence (AAAI), 2022
Sepehr Sameni
Simon Jenni
Paolo Favaro
ViT
226
6
0
10 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for
  Object Detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object DetectionIEEE International Conference on Computer Vision (ICCV), 2022
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
243
66
0
06 Apr 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders
MultiMAE: Multi-modal Multi-task Masked AutoencodersEuropean Conference on Computer Vision (ECCV), 2022
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
427
349
0
04 Apr 2022
Self-distillation Augmented Masked Autoencoders for Histopathological
  Image Classification
Self-distillation Augmented Masked Autoencoders for Histopathological Image ClassificationIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2022
Yang Luo
Zhineng Chen
Shengtian Zhou
Xieping Gao
289
3
0
31 Mar 2022
In-N-Out Generative Learning for Dense Unsupervised Video Segmentation
In-N-Out Generative Learning for Dense Unsupervised Video SegmentationACM Multimedia (ACM MM), 2022
Xiaomiao Pan
Peike Li
Zongxin Yang
Huiling Zhou
Chang Zhou
Hongxia Yang
Jingren Zhou
Yi Yang
VOS
241
12
0
29 Mar 2022
Large-scale Bilingual Language-Image Contrastive Learning
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
277
17
0
28 Mar 2022
Mugs: A Multi-Granular Self-Supervised Learning Framework
Mugs: A Multi-Granular Self-Supervised Learning Framework
Pan Zhou
Yichen Zhou
Chenyang Si
Weihao Yu
Teck Khim Ng
Shuicheng Yan
VLM
190
69
0
27 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language PretrainingEuropean Conference on Computer Vision (ECCV), 2022
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIPVLM
356
22
0
27 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingNeural Information Processing Systems (NeurIPS), 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
739
1,640
0
23 Mar 2022
CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation
CP2: Copy-Paste Contrastive Pretraining for Semantic SegmentationEuropean Conference on Computer Vision (ECCV), 2022
Feng Wang
Huiyu Wang
Chen Wei
Alan Yuille
Wei Shen
SSLVLM
260
39
0
22 Mar 2022
Three things everyone should know about Vision Transformers
Three things everyone should know about Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Jakob Verbeek
Edouard Grave
ViT
247
155
0
18 Mar 2022
MVP: Multimodality-guided Visual Pre-training
MVP: Multimodality-guided Visual Pre-trainingEuropean Conference on Computer Vision (ECCV), 2022
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
236
128
0
10 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image TransformerACM Multimedia (ACM MM), 2022
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViTVLM
400
211
0
04 Mar 2022
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luís Vilacca
Yi Yu
Paula Viana
242
11
0
28 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and LanguageInternational Conference on Machine Learning (ICML), 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSLVLMViT
569
1,037
0
07 Feb 2022
Corrupted Image Modeling for Self-Supervised Visual Pre-Training
Corrupted Image Modeling for Self-Supervised Visual Pre-TrainingInternational Conference on Learning Representations (ICLR), 2022
Yuxin Fang
Li Dong
Hangbo Bao
Xinggang Wang
Furu Wei
303
92
0
07 Feb 2022
Context Autoencoder for Self-Supervised Representation Learning
Context Autoencoder for Self-Supervised Representation LearningInternational Journal of Computer Vision (IJCV), 2022
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
487
454
0
07 Feb 2022
Adversarial Masking for Self-Supervised Learning
Adversarial Masking for Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2022
Yuge Shi
N. Siddharth
Juil Sock
Adam R. Kosiorek
SSL
448
101
0
31 Jan 2022
Previous
123...111213
Next
Page 12 of 13
Pageof 13