ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.07832
  4. Cited By
iBOT: Image BERT Pre-Training with Online Tokenizer
v1v2v3 (latest)

iBOT: Image BERT Pre-Training with Online Tokenizer

15 November 2021
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
ArXiv (abs)PDFHTML

Papers citing "iBOT: Image BERT Pre-Training with Online Tokenizer"

50 / 602 papers shown
Title
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and QuantizationComputer Vision and Pattern Recognition (CVPR), 2025
Siyuan Li
Guang Dai
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
242
6
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIPVLM
395
34
0
01 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
257
2
0
30 Mar 2025
Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets
Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets
Martin Kiss
Michal Hradiš
163
0
0
28 Mar 2025
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
224
0
0
25 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
201
2
0
25 Mar 2025
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation
Qin Wang
Benjamin Bruns
Hanno Scharr
Kai Krajsek
205
1
0
24 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
230
0
0
20 Mar 2025
Object-Centric Pretraining via Target Encoder Bootstrapping
Object-Centric Pretraining via Target Encoder BootstrappingInternational Conference on Learning Representations (ICLR), 2025
Nikola Đukić
Tim Lebailly
Tinne Tuytelaars
OCL
252
0
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
401
0
0
19 Mar 2025
Cube: A Roblox View of 3D Intelligence
Cube: A Roblox View of 3D Intelligence
Foundation AI Team Roblox
Kiran Bhat
Nishchaie Khanna
Karun Channa
Tinghui Zhou
...
Kyle Price
Steve Han
Yiqing Wang
A. Singh
David Baszucki
232
5
0
19 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
220
4
0
17 Mar 2025
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
Leonard Waldmann
Ando Shah
Yi Wang
Nils Lehmann
Adam J. Stewart
Zhitong Xiong
Xiao Xiang Zhu
Stefan Bauer
John Chuang
204
13
0
13 Mar 2025
Robustness Tokens: Towards Adversarial Robustness of TransformersEuropean Conference on Computer Vision (ECCV), 2025
Brian Pulfer
Yury Belousov
S. Voloshynovskiy
AAML
198
0
0
13 Mar 2025
Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery
Chuyu Zhang
Xueyang Yu
Peiyan Gu
Xuming He
CLL
366
0
0
12 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
360
4
0
12 Mar 2025
Task-Agnostic Attacks Against Vision Foundation Models
Brian Pulfer
Yury Belousov
Vitaliy Kinakh
Teddy Furon
S. Voloshynovskiy
AAML
193
0
0
05 Mar 2025
Projection Head is Secretly an Information BottleneckInternational Conference on Learning Representations (ICLR), 2025
Zhuo Ouyang
Kaiwen Hu
Qi Zhang
Yifei Wang
Yisen Wang
285
4
0
01 Mar 2025
Solving Instance Detection from an Open-World Perspective
Solving Instance Detection from an Open-World PerspectiveComputer Vision and Pattern Recognition (CVPR), 2025
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
320
2
0
01 Mar 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and RetentionIEEE Transactions on Medical Imaging (IEEE TMI), 2025
Tianyi Wang
Jianan Fan
Dingxin Zhang
Dongnan Liu
Yong-quan Xia
Heng Huang
Weidong Cai
488
3
0
01 Mar 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
452
14
0
24 Feb 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Masked Latent Prediction and Classification for Self-Supervised Audio Representation LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Aurian Quélennec
Pierre Chouteau
Geoffroy Peeters
S. Essid
SSL
331
6
0
17 Feb 2025
Simplifying DINO via Coding Rate Regularization
Simplifying DINO via Coding Rate Regularization
Ziyang Wu
Jingyuan Zhang
Druv Pai
Xinze Wang
Chandan Singh
Jianwei Yang
Jianfeng Gao
Yi-An Ma
1.2K
9
0
17 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
542
2
0
10 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
268
22
0
06 Feb 2025
A generalizable 3D framework and model for self-supervised learning in medical imaging
A generalizable 3D framework and model for self-supervised learning in medical imaging
Tony Xu
Sepehr Hosseini
Chris Anderson
Anthony Rinaldi
Rahul G. Krishnan
Anne L. Martel
Maged Goubran
MedIm
281
6
0
20 Jan 2025
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
Wenxuan Li
Yaoyao Liu
Zongwei Zhou
MedIm
266
14
0
20 Jan 2025
Keypoint Aware Masked Image Modelling
Keypoint Aware Masked Image ModellingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Madhava Krishna
Convin.AI
347
1
0
03 Jan 2025
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked
  Autoencoder Learning
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder LearningAAAI Conference on Artificial Intelligence (AAAI), 2024
Shentong Mo
177
1
0
23 Dec 2024
Equivariant Representation Learning for Augmentation-based
  Self-Supervised Learning via Image Reconstruction
Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction
Qin Wang
Kai Krajsek
Hanno Scharr
SSL
130
2
0
04 Dec 2024
Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning
Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning
Varun Belagali
Srikar Yellapragada
Alexandros Graikos
S. Kapse
Zilinghan Li
Tarak Nandi
Ravi K. Madduri
Prateek Prasanna
Joel H. Saltz
Dimitris Samaras
DiffM
246
2
0
02 Dec 2024
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Probing the Mid-level Vision Capabilities of Self-Supervised LearningComputer Vision and Pattern Recognition (CVPR), 2024
Xuweiyi Chen
Markus Marks
Zezhou Cheng
435
3
0
25 Nov 2024
Multi-Token Enhancing for Vision Representation Learning
Multi-Token Enhancing for Vision Representation Learning
Zhong-Yu Li
Yu-Song Hu
Bo Yin
Ming-Ming Cheng
388
1
0
24 Nov 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image
  Modeling
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
321
0
0
24 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation LearningACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
418
1
0
24 Nov 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chong Chen
299
0
0
18 Nov 2024
Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation
  with Concept-Guided Feature Enhancement
Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature EnhancementNeural Information Processing Systems (NeurIPS), 2024
Yanyan Huang
Weiqin Zhao
Yihang Chen
Yu Fu
Lequan Yu
MedIm
244
7
0
15 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
546
2
0
15 Nov 2024
Understanding the Role of Equivariance in Self-supervised Learning
Understanding the Role of Equivariance in Self-supervised LearningNeural Information Processing Systems (NeurIPS), 2024
Yifei Wang
Kaiwen Hu
Sharut Gupta
Ziyu Ye
Yisen Wang
Stefanie Jegelka
SSL
258
6
0
10 Nov 2024
Pattern Integration and Enhancement Vision Transformer for
  Self-Supervised Learning in Remote Sensing
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote SensingIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Kaixuan Lu
Ruiqian Zhang
Xiao Huang
Yuxing Xie
Xiaogang Ning
Hanchao Zhang
Mengke Yuan
Pan Zhang
Tao Wang
Tongkui Liao
203
3
0
09 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Classification Done Right for Vision-Language Pre-TrainingNeural Information Processing Systems (NeurIPS), 2024
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIPVLM
347
6
0
05 Nov 2024
Masked Autoencoders are Parameter-Efficient Federated Continual Learners
Masked Autoencoders are Parameter-Efficient Federated Continual LearnersBigData Congress [Services Society] (BSS), 2024
Yuchen He
Xiangfeng Wang
CLLFedML
213
0
0
04 Nov 2024
Sparsh: Self-supervised touch representations for vision-based tactile
  sensing
Sparsh: Self-supervised touch representations for vision-based tactile sensingConference on Robot Learning (CoRL), 2024
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
218
45
0
31 Oct 2024
A Fresh Look at Generalized Category Discovery through Non-negative
  Matrix Factorization
A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization
Zhong Ji
Steve Yang
Jingren Liu
Yanwei Pang
Jungong Han
294
2
0
29 Oct 2024
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen
  Foundation Models
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation ModelsNeural Information Processing Systems (NeurIPS), 2024
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
VLM
225
11
0
25 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive
  Self-supervised Learning
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised LearningNeural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Shengbang Tong
237
5
0
25 Oct 2024
SRA: A Novel Method to Improve Feature Embedding in Self-supervised
  Learning for Histopathological Images
SRA: A Novel Method to Improve Feature Embedding in Self-supervised Learning for Histopathological Images
Hamid Manoochehri
Bodong Zhang
Beatrice Knudsen
Tolga Tasdizen
227
0
0
23 Oct 2024
Benchmarking Pathology Foundation Models: Adaptation Strategies and
  Scenarios
Benchmarking Pathology Foundation Models: Adaptation Strategies and Scenarios
Jeaung Lee
Jeewoo Lim
Keunho Byeon
Jin Tae Kwak
161
13
0
21 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Bushi Liu
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
373
11
0
21 Oct 2024
Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation
Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation
Ronan Docherty
Antonis Vamvakeros
Samuel J. Cooper
301
3
0
20 Oct 2024
Previous
123456...111213
Next