ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08254
  4. Cited By
BEiT: BERT Pre-Training of Image Transformers

BEiT: BERT Pre-Training of Image Transformers

15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
    ViT
ArXivPDFHTML

Papers citing "BEiT: BERT Pre-Training of Image Transformers"

50 / 1,788 papers shown
Title
Disentangled Generative Graph Representation Learning
Disentangled Generative Graph Representation Learning
Xinyue Hu
Zhibin Duan
Xinyang Liu
Yuxin Li
Bo Chen
Mingyuan Zhou
40
0
0
24 Aug 2024
FungiTastic: A multi-modal dataset and benchmark for image categorization
FungiTastic: A multi-modal dataset and benchmark for image categorization
Lukás Picek
Klara Janouskova
Milan Šulc
Jirí Matas
77
1
0
24 Aug 2024
Symmetric masking strategy enhances the performance of Masked Image
  Modeling
Symmetric masking strategy enhances the performance of Masked Image Modeling
Khanh-Binh Nguyen
Chae Jung Park
32
0
0
23 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
38
64
0
22 Aug 2024
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework
  for Multimodal Large Language Model
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
Chaoya Jiang
Jia Hongrui
Haiyang Xu
Wei Ye
Mengfan Dong
Ming Yan
Ji Zhang
Fei Huang
Shikun Zhang
VLM
48
1
0
22 Aug 2024
Macformer: Transformer with Random Maclaurin Feature Attention
Macformer: Transformer with Random Maclaurin Feature Attention
Yuhan Guo
Lizhong Ding
Ye Yuan
Guoren Wang
46
0
0
21 Aug 2024
UniFashion: A Unified Vision-Language Model for Multimodal Fashion
  Retrieval and Generation
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation
Xiangyu Zhao
Yuehan Zhang
Wenlong Zhang
X. Wu
36
4
0
21 Aug 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the
  Model Learn as Intended?
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?
Chen Liang
Qiang Guo
Xiaochao Qu
Luoqi Liu
Ting Liu
VOS
34
0
0
20 Aug 2024
Uniting contrastive and generative learning for event sequences models
Uniting contrastive and generative learning for event sequences models
Aleksandr Yugay
Alexey Zaytsev
AI4TS
32
1
0
19 Aug 2024
NAVERO: Unlocking Fine-Grained Semantics for Video-Language
  Compositionality
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Chaofan Tao
Gukyeong Kwon
Varad Gunjal
Hao Yang
Zhaowei Cai
Yonatan Dukler
Ashwin Swaminathan
R. Manmatha
Colin Jon Taylor
Stefano Soatto
CoGe
27
0
0
18 Aug 2024
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based
  Pre-training for Sound Event Detection
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Pengfei Cai
Yan Song
Kang Li
Haoyu Song
Ian Mcloughlin
31
5
0
16 Aug 2024
SpectralEarth: Training Hyperspectral Foundation Models at Scale
SpectralEarth: Training Hyperspectral Foundation Models at Scale
Nassim Ait Ali Braham
C. Albrecht
Julien Mairal
J. Chanussot
Yi Wang
X. Zhu
38
12
0
15 Aug 2024
SLCA++: Unleash the Power of Sequential Fine-tuning for Continual
  Learning with Pre-training
SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training
Gengwei Zhang
Liyuan Wang
Guoliang Kang
Ling Chen
Yunchao Wei
VLM
CLL
37
2
0
15 Aug 2024
Membership Inference Attack Against Masked Image Modeling
Membership Inference Attack Against Masked Image Modeling
Z. Li
Xinlei He
Ning Yu
Yang Zhang
42
1
0
13 Aug 2024
Enhancing 3D Transformer Segmentation Model for Medical Image with
  Token-level Representation Learning
Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning
Xinrong Hu
Dewen Zeng
Yawen Wu
Xueyang Li
Yiyu Shi
ViT
MedIm
39
0
0
12 Aug 2024
HySparK: Hybrid Sparse Masking for Large Scale Medical Image
  Pre-Training
HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training
Fenghe Tang
Ronghao Xu
Qingsong Yao
Xueming Fu
Quan Quan
Heqin Zhu
Zaiyi Liu
S. Kevin Zhou
SSL
MedIm
40
3
0
11 Aug 2024
PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer
  Architecture
PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture
Qiang Zheng
Chao Zhang
Jian Sun
30
1
0
10 Aug 2024
PersonViT: Large-scale Self-supervised Vision Transformer for Person
  Re-Identification
PersonViT: Large-scale Self-supervised Vision Transformer for Person Re-Identification
Bin Hu
Xinggang Wang
Wenyu Liu
ViT
33
3
0
10 Aug 2024
Enhancing Representation Learning of EEG Data with Masked Autoencoders
Enhancing Representation Learning of EEG Data with Masked Autoencoders
Yifei Zhou
Sitong Liu
39
0
0
09 Aug 2024
AggSS: An Aggregated Self-Supervised Approach for Class-Incremental
  Learning
AggSS: An Aggregated Self-Supervised Approach for Class-Incremental Learning
Jayateja Kalla
Soma Biswas
SSL
31
0
0
08 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Rex Liu
Xin Liu
40
1
0
08 Aug 2024
Image-to-LaTeX Converter for Mathematical Formulas and Text
Image-to-LaTeX Converter for Mathematical Formulas and Text
Daniil Gurgurov
Aleksey Morshnev
ViT
VLM
47
1
0
07 Aug 2024
Attacks and Defenses for Generative Diffusion Models: A Comprehensive
  Survey
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
V. T. Truong
Luan Ba Dang
Long Bao Le
DiffM
MedIm
45
16
0
06 Aug 2024
Sample-agnostic Adversarial Perturbation for Vision-Language
  Pre-training Models
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models
Haonan Zheng
Wen Jiang
Xinyang Deng
Wenrui Li
VLM
AAML
21
2
0
06 Aug 2024
LEGO: Self-Supervised Representation Learning for Scene Text Images
LEGO: Self-Supervised Representation Learning for Scene Text Images
Yujin Ren
Jiaxin Zhang
Lianwen Jin
SSL
31
0
0
04 Aug 2024
Unsupervised Representation Learning by Balanced Self Attention Matching
Unsupervised Representation Learning by Balanced Self Attention Matching
Daniel Shalam
Simon Korman
SSL
33
0
0
04 Aug 2024
Masked Angle-Aware Autoencoder for Remote Sensing Images
Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li
B. Hou
Siteng Ma
Zitong Wu
Xianpeng Guo
Bo Ren
Licheng Jiao
41
11
0
04 Aug 2024
Downstream Transfer Attack: Adversarial Attacks on Downstream Models
  with Pre-trained Vision Transformers
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
Weijie Zheng
Xingjun Ma
Hanxun Huang
Zuxuan Wu
Yu-Gang Jiang
AAML
32
0
0
03 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
39
6
0
02 Aug 2024
UNER: A Unified Prediction Head for Named Entity Recognition in
  Visually-rich Documents
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
Yi Tu
Chong Zhang
Ya Guo
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
43
3
0
02 Aug 2024
POA: Pre-training Once for Models of All Sizes
POA: Pre-training Once for Models of All Sizes
Yingying Zhang
Xin Guo
Jiangwei Lao
Lei Yu
Lixiang Ru
Jian Wang
Guo Ye
Huimei He
Jingdong Chen
Ming Yang
65
1
0
02 Aug 2024
Text-Guided Video Masked Autoencoder
Text-Guided Video Masked Autoencoder
D. Fan
Jue Wang
Shuai Liao
Zhikang Zhang
Vimal Bhat
Xinyu Li
VGen
23
3
0
01 Aug 2024
AMAES: Augmented Masked Autoencoder Pretraining on Public Brain MRI Data
  for 3D-Native Segmentation
AMAES: Augmented Masked Autoencoder Pretraining on Public Brain MRI Data for 3D-Native Segmentation
Asbjorn Munk
Jakob Ambsdorf
S. Llambias
Mads Nielsen
30
4
0
01 Aug 2024
Big Cooperative Learning
Big Cooperative Learning
Yulai Cong
AI4CE
36
0
0
31 Jul 2024
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue
Anurag Das
Francis Engelmann
Siyu Tang
J. E. Lenssen
48
24
0
29 Jul 2024
Self-Supervised Learning for Text Recognition: A Critical Survey
Self-Supervised Learning for Text Recognition: A Critical Survey
Carlos Peñarrubia
J. J. Valero-Mas
Jorge Calvo-Zaragoza
69
1
0
29 Jul 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
37
0
0
28 Jul 2024
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Pulkit Kumar
Namitha Padmanabhan
Luke Luo
Sai Saketh Rambhatla
Abhinav Shrivastava
32
4
0
25 Jul 2024
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Qing Su
Shihao Ji
24
0
0
24 Jul 2024
PEEKABOO: Hiding parts of an image for unsupervised object localization
PEEKABOO: Hiding parts of an image for unsupervised object localization
Hasib Zunair u
24 A.BenHamza
SSL
36
0
0
24 Jul 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming-hui Sun
Chao Zhou
Jihong Zhu
34
3
0
23 Jul 2024
A Multi-view Mask Contrastive Learning Graph Convolutional Neural
  Network for Age Estimation
A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation
Yiping Zhang
Yuntao Shou
Tao Meng
Wei Ai
Keqin Li
CVBM
43
10
0
23 Jul 2024
QueST: Self-Supervised Skill Abstractions for Learning Continuous
  Control
QueST: Self-Supervised Skill Abstractions for Learning Continuous Control
Atharva Mete
Haotian Xue
Albert Wilcox
Yongxin Chen
Animesh Garg
SSL
32
16
0
22 Jul 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual
  Representation Learning
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei
Abhinav Gupta
Pedro Morgado
SSL
47
7
0
22 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
49
3
0
22 Jul 2024
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Fudong Lin
Jiadong Lou
Xu Yuan
Nianfeng Tzeng
ViT
AAML
28
1
0
22 Jul 2024
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and
  Semantically-Rich Vision-Language Models
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and Semantically-Rich Vision-Language Models
Md Zarif Hossain
Ahmed Imteaj
VLM
AAML
36
4
0
20 Jul 2024
Decoupled Prompt-Adapter Tuning for Continual Activity Recognition
Decoupled Prompt-Adapter Tuning for Continual Activity Recognition
Di Fu
Thanh Vinh Vo
Haozhe Ma
Tze-Yun Leong
27
0
0
20 Jul 2024
Downstream-Pretext Domain Knowledge Traceback for Active Learning
Downstream-Pretext Domain Knowledge Traceback for Active Learning
Beichen Zhang
Liang-Sheng Li
Zheng-Jun Zha
Jiebo Luo
Qingming Huang
28
0
0
20 Jul 2024
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting
  Recognition
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition
Gagan Bhatia
El Moatez Billah Nagoudi
Fakhraddin Alwajih
Muhammad Abdul-Mageed
32
3
0
18 Jul 2024
Previous
123...567...343536
Next