Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.08254
Cited By
BEiT: BERT Pre-Training of Image Transformers
15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BEiT: BERT Pre-Training of Image Transformers"
50 / 1,788 papers shown
Title
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Jie Zhu
Jirong Zha
Ding Li
Leye Wang
24
0
0
15 May 2025
Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
Yili He
Yan Zhu
Peiyao Fu
Ruijie Yang
Tianyi Chen
Zhihua Wang
Quanlin Li
Pinghong Zhou
X. J. Yang
Shuo Wang
MedIm
VLM
20
0
0
14 May 2025
A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning
Berkay Guler
Giovanni Geraci
Hamid Jafarkhani
24
0
0
14 May 2025
TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series
Xiaolei Qin
Di Wang
J. Zhang
Fengxiang Wang
Xin Su
Bo Du
Liangpei Zhang
AI4TS
24
0
0
13 May 2025
VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation
B. K. Das
Ajay Singh
Gengyan Zhao
Han Liu
Thomas J. Re
D. Comaniciu
Eli Gibson
Andreas K. Maier
ViT
MedIm
26
0
0
13 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush Rai
Kyle Min
Tarun Krishna
Feiyan Hu
A. Smeaton
Noel E. O'Connor
VGen
24
0
0
13 May 2025
Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies
Zhengmi Tang
Yuto Mitsui
Tomo Miyazaki
S. Omachi
31
0
0
11 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
31
0
0
10 May 2025
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
Sifan Song
Siyeop Yoon
Pengfei Jin
Sekeun Kim
Matthew Tivnan
...
Zhiliang Lyu
Dufan Wu
Ning Guo
Xiang Li
Quanzheng Li
OOD
ViT
64
0
0
08 May 2025
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?
Shashank Agnihotri
David Schader
Nico Sharei
Mehmet Ege Kaçar
M. Keuper
41
2
0
07 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi
Eilif Muller
Shahab Bakhtiari
49
0
0
06 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
51
0
0
05 May 2025
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Tam Minh Nguyen
Ngoc N. Tran
Khai Nguyen
Richard G. Baraniuk
MoE
59
0
0
01 May 2025
Vision Transformers in Precision Agriculture: A Comprehensive Survey
Saber Mehdipour
Seyed Abolghasem Mirroshandel
Seyed Amirhossein Tabatabaei
29
0
0
30 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
84
0
0
29 Apr 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
Sehyeong Jo
Gangjae Jang
Haesol Park
32
0
0
28 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DiffM
31
0
0
22 Apr 2025
Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation
Johannes Spoecklberger
W. Lin
Pedro Hermosilla
Sivan Doveh
Horst Possegger
M. Jehanzeb Mirza
17
0
0
19 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue
Yulin Wang
Chenxin Tao
Pan Liu
Shiji Song
Gao Huang
MedIm
26
0
0
18 Apr 2025
BeetleVerse: A study on taxonomic classification of ground beetles
S M Rayeed
Alyson East
Samuel Stevens
Sydne Record
Charles V. Stewart
21
0
0
18 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
X. Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
34
0
0
17 Apr 2025
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
29
0
0
17 Apr 2025
SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling
Yasin Almalioglu
Andrzej Kucik
Geoffrey French
Dafni Antotsiou
Alexander Adam
Cedric Archambeau
21
0
0
17 Apr 2025
H
3
^3
3
GNNs: Harmonizing Heterophily and Homophily in GNNs via Joint Structural Node Encoding and Self-Supervised Learning
Rui Xue
Tianfu Wu
AI4CE
35
0
0
16 Apr 2025
Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" Correction
Seyyed Ali Ayati
Jin Hyun Park
Yichen Cai
Marcus Botacin
31
0
0
15 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
26
2
0
14 Apr 2025
GFT: Gradient Focal Transformer
Boris Kriuk
Simranjit Kaur Gill
Shoaib Aslam
Amir Fakhrutdinov
31
0
0
14 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
37
0
0
12 Apr 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
27
0
0
12 Apr 2025
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
Qi Zhi Lim
C. Lee
K. Lim
Kalaiarasi Sonai Muthu Anbananthen
31
0
0
11 Apr 2025
Boosting the Class-Incremental Learning in 3D Point Clouds via Zero-Collection-Cost Basic Shape Pre-Training
Chao Qi
Jianqin Yin
Meng Chen
Yingchun Niu
Yuan Sun
3DPC
CLL
41
0
0
11 Apr 2025
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
Wenfeng Feng
Guoying Sun
26
0
0
09 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViT
SSL
46
0
0
08 Apr 2025
Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Zhi Zuo
Chenyi Zhuang
Zhiqiang Shen
Pan Gao
Jie Qin
3DPC
27
0
0
07 Apr 2025
Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
Hamza Riaz
A. Smeaton
39
0
0
05 Apr 2025
Foundation Models for Time Series: A Survey
Siva Rama Krishna Kottapalli
Karthik Hubli
Sandeep Chandrashekhara
Garima Jain
Sunayana Hubli
Gayathri Botla
Ramesh Doddaiah
AI4TS
AI4CE
23
0
0
05 Apr 2025
MIMRS: A Survey on Masked Image Modeling in Remote Sensing
Shabnam Choudhury
Akhil Vasim
Michael Schmitt
Biplab Banerjee
33
0
0
04 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
38
0
0
03 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
46
0
0
01 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
55
0
0
01 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
33
0
0
30 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Y. Wang
44
0
0
28 Mar 2025
Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets
Martin Kiss
Michal Hradiš
34
0
0
28 Mar 2025
Dual-Task Learning for Dead Tree Detection and Segmentation with Hybrid Self-Attention U-Nets in Aerial Imagery
Anis Ur Rahman
Einari Heinaro
Mete Ahishali
Samuli Junttila
40
1
0
27 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
127
2
0
27 Mar 2025
SChanger: Change Detection from a Semantic Change and Spatial Consistency Perspective
Ziyu Zhou
Keyan Hu
Yutian Fang
Xiaoping Rui
78
0
0
26 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Z. Yang
Lijuan Wang
Min Li
DiffM
68
0
0
26 Mar 2025
Enabling Heterogeneous Adversarial Transferability via Feature Permutation Attacks
Tao Wu
Tie Luo
AAML
84
0
0
26 Mar 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
184
0
0
26 Mar 2025
UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants
Yide Di
Yun Liao
Hao Zhou
Kaijun Zhu
Qing Duan
Junhui Liu
Mingyu Lu
34
0
0
26 Mar 2025
1
2
3
4
...
34
35
36
Next