Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2111.07832
Cited By
v1
v2
v3 (latest)
iBOT: Image BERT Pre-Training with Online Tokenizer
15 November 2021
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"iBOT: Image BERT Pre-Training with Online Tokenizer"
50 / 602 papers shown
Title
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Computer Vision and Pattern Recognition (CVPR), 2025
Siyuan Li
Guang Dai
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
242
6
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
395
34
0
01 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
257
2
0
30 Mar 2025
Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets
Martin Kiss
Michal Hradiš
163
0
0
28 Mar 2025
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
224
0
0
25 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
201
2
0
25 Mar 2025
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation
Qin Wang
Benjamin Bruns
Hanno Scharr
Kai Krajsek
205
1
0
24 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
230
0
0
20 Mar 2025
Object-Centric Pretraining via Target Encoder Bootstrapping
International Conference on Learning Representations (ICLR), 2025
Nikola Đukić
Tim Lebailly
Tinne Tuytelaars
OCL
252
0
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
401
0
0
19 Mar 2025
Cube: A Roblox View of 3D Intelligence
Foundation AI Team Roblox
Kiran Bhat
Nishchaie Khanna
Karun Channa
Tinghui Zhou
...
Kyle Price
Steve Han
Yiqing Wang
A. Singh
David Baszucki
232
5
0
19 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
220
4
0
17 Mar 2025
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
Leonard Waldmann
Ando Shah
Yi Wang
Nils Lehmann
Adam J. Stewart
Zhitong Xiong
Xiao Xiang Zhu
Stefan Bauer
John Chuang
204
13
0
13 Mar 2025
Robustness Tokens: Towards Adversarial Robustness of Transformers
European Conference on Computer Vision (ECCV), 2025
Brian Pulfer
Yury Belousov
S. Voloshynovskiy
AAML
198
0
0
13 Mar 2025
Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery
Chuyu Zhang
Xueyang Yu
Peiyan Gu
Xuming He
CLL
366
0
0
12 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
360
4
0
12 Mar 2025
Task-Agnostic Attacks Against Vision Foundation Models
Brian Pulfer
Yury Belousov
Vitaliy Kinakh
Teddy Furon
S. Voloshynovskiy
AAML
193
0
0
05 Mar 2025
Projection Head is Secretly an Information Bottleneck
International Conference on Learning Representations (ICLR), 2025
Zhuo Ouyang
Kaiwen Hu
Qi Zhang
Yifei Wang
Yisen Wang
285
4
0
01 Mar 2025
Solving Instance Detection from an Open-World Perspective
Computer Vision and Pattern Recognition (CVPR), 2025
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
320
2
0
01 Mar 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
IEEE Transactions on Medical Imaging (IEEE TMI), 2025
Tianyi Wang
Jianan Fan
Dingxin Zhang
Dongnan Liu
Yong-quan Xia
Heng Huang
Weidong Cai
488
3
0
01 Mar 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
452
14
0
24 Feb 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Aurian Quélennec
Pierre Chouteau
Geoffroy Peeters
S. Essid
SSL
331
6
0
17 Feb 2025
Simplifying DINO via Coding Rate Regularization
Ziyang Wu
Jingyuan Zhang
Druv Pai
Xinze Wang
Chandan Singh
Jianwei Yang
Jianfeng Gao
Yi-An Ma
1.2K
9
0
17 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
542
2
0
10 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
268
22
0
06 Feb 2025
A generalizable 3D framework and model for self-supervised learning in medical imaging
Tony Xu
Sepehr Hosseini
Chris Anderson
Anthony Rinaldi
Rahul G. Krishnan
Anne L. Martel
Maged Goubran
MedIm
281
6
0
20 Jan 2025
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
Wenxuan Li
Yaoyao Liu
Zongwei Zhou
MedIm
266
14
0
20 Jan 2025
Keypoint Aware Masked Image Modelling
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Madhava Krishna
Convin.AI
347
1
0
03 Jan 2025
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Shentong Mo
177
1
0
23 Dec 2024
Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction
Qin Wang
Kai Krajsek
Hanno Scharr
SSL
130
2
0
04 Dec 2024
Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning
Varun Belagali
Srikar Yellapragada
Alexandros Graikos
S. Kapse
Zilinghan Li
Tarak Nandi
Ravi K. Madduri
Prateek Prasanna
Joel H. Saltz
Dimitris Samaras
DiffM
246
2
0
02 Dec 2024
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Computer Vision and Pattern Recognition (CVPR), 2024
Xuweiyi Chen
Markus Marks
Zezhou Cheng
435
3
0
25 Nov 2024
Multi-Token Enhancing for Vision Representation Learning
Zhong-Yu Li
Yu-Song Hu
Bo Yin
Ming-Ming Cheng
388
1
0
24 Nov 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
321
0
0
24 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
ACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
418
1
0
24 Nov 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chong Chen
299
0
0
18 Nov 2024
Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement
Neural Information Processing Systems (NeurIPS), 2024
Yanyan Huang
Weiqin Zhao
Yihang Chen
Yu Fu
Lequan Yu
MedIm
244
7
0
15 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
546
2
0
15 Nov 2024
Understanding the Role of Equivariance in Self-supervised Learning
Neural Information Processing Systems (NeurIPS), 2024
Yifei Wang
Kaiwen Hu
Sharut Gupta
Ziyu Ye
Yisen Wang
Stefanie Jegelka
SSL
258
6
0
10 Nov 2024
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Kaixuan Lu
Ruiqian Zhang
Xiao Huang
Yuxing Xie
Xiaogang Ning
Hanchao Zhang
Mengke Yuan
Pan Zhang
Tao Wang
Tongkui Liao
203
3
0
09 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Neural Information Processing Systems (NeurIPS), 2024
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
347
6
0
05 Nov 2024
Masked Autoencoders are Parameter-Efficient Federated Continual Learners
BigData Congress [Services Society] (BSS), 2024
Yuchen He
Xiangfeng Wang
CLL
FedML
213
0
0
04 Nov 2024
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Conference on Robot Learning (CoRL), 2024
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
218
45
0
31 Oct 2024
A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization
Zhong Ji
Steve Yang
Jingren Liu
Yanwei Pang
Jungong Han
294
2
0
29 Oct 2024
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Neural Information Processing Systems (NeurIPS), 2024
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
VLM
225
11
0
25 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Neural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Shengbang Tong
237
5
0
25 Oct 2024
SRA: A Novel Method to Improve Feature Embedding in Self-supervised Learning for Histopathological Images
Hamid Manoochehri
Bodong Zhang
Beatrice Knudsen
Tolga Tasdizen
227
0
0
23 Oct 2024
Benchmarking Pathology Foundation Models: Adaptation Strategies and Scenarios
Jeaung Lee
Jeewoo Lim
Keunho Byeon
Jin Tae Kwak
161
13
0
21 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Bushi Liu
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
373
11
0
21 Oct 2024
Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation
Ronan Docherty
Antonis Vamvakeros
Samuel J. Cooper
301
3
0
20 Oct 2024
Previous
1
2
3
4
5
6
...
11
12
13
Next