Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.12710
Cited By
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
24 November 2021
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers"
50 / 189 papers shown
Title
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
46
0
0
01 Apr 2025
Should we pre-train a decoder in contrastive learning for dense prediction tasks?
S. Quetin
Tapotosh Ghosh
Farhad Maleki
SSL
69
0
0
21 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
X. Li
Jason Kuen
H. Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe-nan Lin
Marios Savvides
55
0
0
11 Mar 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
78
1
0
10 Feb 2025
Slot-BERT: Self-supervised Object Discovery in Surgical Video
Guiqiu Liao
M. Jogan
Marcel Hussing
Kenta Nakahashi
Kazuhiro Yasufuku
Amin Madani
Eric Eaton
Daniel A. Hashimoto
53
0
0
21 Jan 2025
Code and Pixels: Multi-Modal Contrastive Pre-training for Enhanced Tabular Data Analysis
Kankana Roy
Lars Krämer
Sebastian Domaschke
Malik Haris
Roland Aydin
Fabian Isensee
Martin Held
38
0
0
13 Jan 2025
Remote Inference over Dynamic Links via Adaptive Rate Deep Task-Oriented Vector Quantization
Eyal Fishel
M. Malka
Shai Ginzach
Nir Shlezinger
29
0
0
07 Jan 2025
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
Shentong Mo
35
0
0
23 Dec 2024
Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation
J. Zhang
Li Zhang
Shijian Li
VLM
71
0
0
18 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
H. Chen
Z. Wang
X. Li
X. Sun
Fangyi Chen
Jiang Liu
J. Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
106
6
0
14 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
X. Li
Kai Qiu
H. Chen
Jason Kuen
Jiuxiang Gu
J. Wang
Zhe-nan Lin
Bhiksha Raj
VLM
114
3
0
02 Dec 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
64
0
0
24 Nov 2024
Restructuring Vector Quantization with the Rotation Trick
Christopher Fifty
Ronald G. Junkins
Dennis Duan
Aniketh Iger
Jerry W. Liu
Ehsan Amid
Sebastian Thrun
Christopher Ré
LLMSV
36
11
0
08 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe-nan Lin
VLM
34
17
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
47
2
0
02 Oct 2024
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang
Lukas Hoyer
Mark Weber
Tobias Fischer
Dengxin Dai
Laura Leal-Taixé
Marc Pollefeys
Daniel Cremers
Luc Van Gool
MDE
21
3
0
29 Aug 2024
FungiTastic: A multi-modal dataset and benchmark for image categorization
Lukás Picek
Klara Janouskova
Milan Šulc
Jirí Matas
72
1
0
24 Aug 2024
Symmetric masking strategy enhances the performance of Masked Image Modeling
Khanh-Binh Nguyen
Chae Jung Park
19
0
0
23 Aug 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?
Chen Liang
Qiang Guo
Xiaochao Qu
Luoqi Liu
Ting Liu
VOS
27
0
0
20 Aug 2024
Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning
Xinrong Hu
Dewen Zeng
Yawen Wu
Xueyang Li
Yiyu Shi
ViT
MedIm
31
0
0
12 Aug 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei
Abhinav Gupta
Pedro Morgado
SSL
34
7
0
22 Jul 2024
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
Markus Marks
Manuel Knott
Neehar Kondapaneni
Elijah Cole
T. Defraeye
Fernando Pérez-Cruz
Pietro Perona
SSL
24
2
0
16 Jul 2024
On the Role of Discrete Tokenization in Visual Representation Learning
Tianqi Du
Yifei Wang
Yisen Wang
27
7
0
12 Jul 2024
WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images
Yannik Glaser
J. Stopa
Linnea M. Wolniewicz
Ralph Foster
Doug Vandemark
A. Mouche
Bertrand Chapron
Peter Sadowski
23
1
0
26 Jun 2024
SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation
Yike Yuan
Huanzhang Dou
Fengjun Guo
Xi Li
18
2
0
15 Jun 2024
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
Zhifan Wan
Jie M. Zhang
Chang-bo Li
Shiguang Shan
60
0
0
21 May 2024
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei
Zixuan Pan
Andrew Owens
VLM
21
6
0
14 May 2024
Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation
Chen Chen
Kai Qiao
Jie Yang
Jian Chen
Bin Yan
22
0
0
09 May 2024
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
37
1
0
18 Apr 2024
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Diana-Nicoleta Grigore
Mariana-Iuliana Georgescu
J. A. Justo
T. Johansen
Andreea-Iuliana Ionescu
Radu Tudor Ionescu
21
0
0
14 Apr 2024
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu
Jinliang Zheng
Yu Liu
Hongsheng Li
VLM
19
3
0
11 Apr 2024
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning
Mahsa Ehsanpour
Ian Reid
Hamid Rezatofighi
ViT
27
0
0
08 Apr 2024
Transformer based Pluralistic Image Completion with Reduced Information Loss
Qiankun Liu
Yuqi Jiang
Zhentao Tan
Dongdong Chen
Ying Fu
Qi Chu
Gang Hua
Nenghai Yu
ViT
50
11
0
31 Mar 2024
Adversarially Masked Video Consistency for Unsupervised Domain Adaptation
Xiaoyu Zhu
Junwei Liang
Po-Yao Huang
Alex Hauptmann
28
1
0
24 Mar 2024
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang
Huaibin Wang
Chuyao Luo
Xutao Li
Guotao Liang
Yunming Ye
Xiaochen Qi
Yao He
27
11
0
15 Mar 2024
BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses
Weihao Zeng
Keqing He
Yejie Wang
Dayuan Fu
Weiran Xu
20
0
0
02 Mar 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
VLM
22
0
0
01 Mar 2024
MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition
Fan Zhang
Xiaobao Guo
Xiaojiang Peng
Alex C. Kot
11
0
0
14 Jan 2024
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Edward Sanderson
B. Matuszewski
10
2
0
11 Jan 2024
Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence
Ruizhuo Xu
Linzhi Huang
Mei Wang
Jiani Hu
Weihong Deng
ViT
MedIm
27
1
0
01 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
29
13
0
31 Dec 2023
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian
Lijie Fan
Kaifeng Chen
Dina Katabi
Dilip Krishnan
Phillip Isola
6
43
0
28 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Alan L. Yuille
Cihang Xie
VLM
34
7
0
04 Dec 2023
Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning
Utku Mert Topcuoglu
Erdem Akagündüz
27
1
0
02 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
19
13
0
02 Dec 2023
Improve Supervised Representation Learning with Masked Image Modeling
Kaifeng Chen
Daniel M. Salz
Huiwen Chang
Kihyuk Sohn
Dilip Krishnan
Mojtaba Seyedhosseini
SSL
ViT
14
2
0
01 Dec 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
30
0
0
28 Nov 2023
Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder
Xiaohao Xu
28
0
0
23 Nov 2023
PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders
Hezhen Hu
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Lu Yuan
Dong Chen
Houqiang Li
13
3
0
08 Nov 2023
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao
Bingkun Huang
Sen Xing
Gangshan Wu
Yu Qiao
Limin Wang
27
5
0
06 Nov 2023
1
2
3
4
Next