Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.08254
Cited By
BEiT: BERT Pre-Training of Image Transformers
15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BEiT: BERT Pre-Training of Image Transformers"
50 / 1,788 papers shown
Title
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
55
0
0
25 Mar 2025
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Stefan Stojanov
David Wendt
Seungwoo Kim
R. Venkatesh
Kevin T. Feigelis
Jiajun Wu
Daniel L. K. Yamins
SSL
71
0
0
25 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
44
0
0
25 Mar 2025
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies
Niccolò Cavagnero
Alexander Hermans
Narges Norouzi
Giuseppe Averta
Bastian Leibe
Gijs Dubbelman
Daan de Geus
ViT
VLM
59
1
0
24 Mar 2025
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Yifei Zhang
Chang-Shu Liu
Jin Wei
Xiaomeng Yang
Yu Zhou
Can Ma
Xiangyang Ji
60
2
0
24 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
48
0
0
24 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
57
0
0
21 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
52
1
0
21 Mar 2025
Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion
Yu Sun
Yin Li
R.-H. Sun
Chunhui Liu
Fangming Zhou
Ze Jin
Linjie Wang
Xiang Shen
Zhuolin Hao
Hongyu Xiong
VLM
48
0
0
21 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
53
1
0
21 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
P. Radeva
54
0
0
19 Mar 2025
Utilization of Neighbor Information for Image Classification with Different Levels of Supervision
Gihan Jayatilaka
Abhinav Shrivastava
M. Gwilliam
64
0
0
18 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
53
1
0
17 Mar 2025
8-Calves Image dataset
Xuyang Fang
S. Hannuna
Neill D. F. Campbell
113
0
0
17 Mar 2025
MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network
Vrushank Ahire
Kunal Shah
Mudasir Nazir Khan
Nikhil Pakhale
L. Sookha
M. A. Ganaie
Abhinav Dhall
65
0
0
16 Mar 2025
Shape Bias and Robustness Evaluation via Cue Decomposition for Image Classification and Segmentation
Edgar Heinert
Thomas Gottwald
Annika Mütze
Matthias Rottmann
62
0
0
16 Mar 2025
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Joona Kareinen
T. Eerola
K. Kraft
L. Lensu
S. Suikkanen
H. Kalviainen
SSL
148
0
0
14 Mar 2025
Unlocking Open-Set Language Accessibility in Vision Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
53
0
0
14 Mar 2025
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang
H. Wang
Y. Wang
Di Wang
Mingshuo Chen
...
Yangang Sun
Shuo Wang
L. Lan
Wenjing Yang
Jing Zhang
Mamba
77
3
0
13 Mar 2025
Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning
Wenyi Lian
Joakim Lindblad
Patrick Micke
Natasa Sladoje
62
0
0
12 Mar 2025
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
Hariprasath Govindarajan
Maciej K. Wozniak
Marvin Klingner
Camille Maurice
B. R. Kiran
S. Yogamani
53
0
0
12 Mar 2025
Effective and Efficient Masked Image Generation Models
Zebin You
Jingyang Ou
Xiaolu Zhang
Jun Hu
Jun Zhou
Chongxuan Li
DiffM
VLM
54
1
0
10 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
41
0
0
10 Mar 2025
Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression
Thibaut Loiseau
Guillaume Bourmaud
Vincent Lepetit
64
0
0
10 Mar 2025
Brain Inspired Adaptive Memory Dual-Net for Few-Shot Image Classification
Kexin Di
Xiuxing Li
Yuyang Han
Ziyu Li
Qing Li
Xia Wu
VLM
53
0
0
10 Mar 2025
CLICv2: Image Complexity Representation via Content Invariance Contrastive Learning
Shipeng Liu
Liang Zhao
Dengfeng Chen
SSL
96
0
0
09 Mar 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
60
0
0
08 Mar 2025
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
Suhwan Cho
Seunghoon Lee
Minhyeok Lee
Jungho Lee
Sangyoun Lee
VOS
77
0
0
05 Mar 2025
Task-Agnostic Attacks Against Vision Foundation Models
Brian Pulfer
Yury Belousov
Vitaliy Kinakh
Teddy Furon
S. Voloshynovskiy
AAML
68
0
0
05 Mar 2025
Is Pre-training Applicable to the Decoder for Dense Prediction?
Chao Ning
Wanshui Gan
Weihao Xuan
Naoto Yokoya
48
0
0
05 Mar 2025
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
Jingzhou Luo
Y. Liu
Weixing Chen
Zhen Li
Y. Wang
G. Li
Liang Lin
65
2
0
05 Mar 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Paul Janson
Vaibhav Singh
Paria Mehrbod
Adam Ibrahim
Irina Rish
Eugene Belilovsky
Benjamin Thérien
CLL
75
0
0
04 Mar 2025
Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation
Wenzhao Xiang
Chang Liu
Hongyang Yu
Xilin Chen
31
0
0
02 Mar 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Tianyi Wang
Jianan Fan
Dingxin Zhang
Dongnan Liu
Yong-quan Xia
Heng Huang
Weidong Cai
36
0
0
01 Mar 2025
TimesBERT: A BERT-Style Foundation Model for Time Series Understanding
Haoran Zhang
Yong Liu
Yunzhong Qiu
Haixuan Liu
Zhongyi Pei
Jianmin Wang
Mingsheng Long
AI4TS
42
0
0
28 Feb 2025
Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
Mohamed Abdelsamad
Michael Ulrich
Claudius Gläser
Abhinav Valada
3DPC
42
0
0
27 Feb 2025
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Siyu Jiao
Gengwei Zhang
Yinlong Qian
Jiancheng Huang
Yao Zhao
Humphrey Shi
Lin Ma
Y. X. Wei
Zequn Jie
VLM
39
1
0
27 Feb 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
49
0
0
25 Feb 2025
Arrhythmia Classification from 12-Lead ECG Signals Using Convolutional and Transformer-Based Deep Learning Models
Andrei Apostol
Maria Nutu
60
0
0
25 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
65
8
0
24 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
92
2
0
24 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
58
43
0
24 Feb 2025
Tight Clusters Make Specialized Experts
Stefan K. Nielsen
R. Teo
Laziz U. Abdullaev
Tan M. Nguyen
MoE
59
2
0
21 Feb 2025
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Yongqi Dong
Xingmin Lu
Ruohan Li
Wei Song
B. Arem
Haneen Farah
ViT
105
1
0
21 Feb 2025
Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT Segmentation
Yuexing Ding
Jun Wang
H. Lyu
86
0
0
17 Feb 2025
Hyperspherical Energy Transformer with Recurrent Depth
Yunzhe Hu
Difan Zou
Dong Xu
41
0
0
17 Feb 2025
Harnessing Vision Models for Time Series Analysis: A Survey
Jingchao Ni
Ziming Zhao
ChengAo Shen
Hanghang Tong
Dongjin Song
Wei Cheng
Dongsheng Luo
Haifeng Chen
AI4TS
77
1
0
13 Feb 2025
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
101
0
0
12 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
86
1
0
10 Feb 2025
Previous
1
2
3
4
5
...
34
35
36
Next