ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08254
  4. Cited By
BEiT: BERT Pre-Training of Image Transformers

BEiT: BERT Pre-Training of Image Transformers

15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
    ViT
ArXivPDFHTML

Papers citing "BEiT: BERT Pre-Training of Image Transformers"

50 / 1,788 papers shown
Title
ColorMAE: Exploring data-independent masking strategies in Masked
  AutoEncoders
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
Carlos Hinojosa
Shuming Liu
Bernard Ghanem
26
2
0
17 Jul 2024
A Closer Look at Benchmarking Self-Supervised Pre-training with Image
  Classification
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
Markus Marks
Manuel Knott
Neehar Kondapaneni
Elijah Cole
T. Defraeye
Fernando Pérez-Cruz
Pietro Perona
SSL
45
2
0
16 Jul 2024
Encapsulating Knowledge in One Prompt
Encapsulating Knowledge in One Prompt
Qi Li
Runpeng Yu
Xinchao Wang
VLM
KELM
49
3
0
16 Jul 2024
STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton
  Sequences
STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences
Soroush Mehraban
Mohammad Javad Rajabi
Babak Taati
3DPC
29
0
0
15 Jul 2024
Joint-Embedding Predictive Architecture for Self-Supervised Learning of
  Mask Classification Architecture
Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture
Donghee Kim
Sungduk Cho
Hyeonwoo Cho
Chanmin Park
Jinyoung Kim
Won Hwa Kim
47
0
0
15 Jul 2024
Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
Mang Ning
A. A. Salah
Itir Onal Ertugrul
CVBM
80
4
0
15 Jul 2024
Pre-training Point Cloud Compact Model with Partial-aware Reconstruction
Pre-training Point Cloud Compact Model with Partial-aware Reconstruction
Yaohua Zha
Yanzi Wang
Tao Dai
Shu-Tao Xia
40
0
0
12 Jul 2024
On the Role of Discrete Tokenization in Visual Representation Learning
On the Role of Discrete Tokenization in Visual Representation Learning
Tianqi Du
Yifei Wang
Yisen Wang
49
7
0
12 Jul 2024
Revealing the Dark Secrets of Extremely Large Kernel ConvNets on
  Robustness
Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness
Honghao Chen
Yurong Zhang
Xiaokun Feng
Xiangxiang Chu
Kaiqi Huang
AAML
42
5
0
12 Jul 2024
Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate
  Video-based Bug Reports
Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports
Yanfu Yan
Nathan Cooper
Oscar Chaparro
Kevin Moran
Denys Poshyvanyk
43
5
0
11 Jul 2024
15M Multimodal Facial Image-Text Dataset
15M Multimodal Facial Image-Text Dataset
Dawei Dai
Yutang Li
Yingge Liu
Mingming Jia
Zhang YuanHui
Guoyin Wang
VLM
28
7
0
11 Jul 2024
Disentangling Masked Autoencoders for Unsupervised Domain Generalization
Disentangling Masked Autoencoders for Unsupervised Domain Generalization
An Zhang
Han Wang
Xiang Wang
Tat-Seng Chua
49
0
0
10 Jul 2024
Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation
Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation
Filipe Lauar
Valentin Laurent
29
0
0
09 Jul 2024
MolTRES: Improving Chemical Language Representation Learning for
  Molecular Property Prediction
MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction
Jun-Hyung Park
Yeachan Kim
Mingyu Lee
Hyuntae Park
SangKeun Lee
32
0
0
09 Jul 2024
AnatoMask: Enhancing Medical Image Segmentation with
  Reconstruction-guided Self-masking
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Yuheng Li
Tianyu Luan
Yizhou Wu
Shaoyan Pan
Yenho Chen
Xiaofeng Yang
40
4
0
09 Jul 2024
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for
  Interleaved Image-Text Generation
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Ethan Chern
Jiadi Su
Yan Ma
Pengfei Liu
MLLM
29
27
0
08 Jul 2024
MobileFlow: A Multimodal LLM For Mobile GUI Agent
MobileFlow: A Multimodal LLM For Mobile GUI Agent
Songqin Nong
Jiali Zhu
Rui Wu
Jiongchao Jin
Shuo Shan
Xiutian Huang
Wenhao Xu
27
7
0
05 Jul 2024
Precision at Scale: Domain-Specific Datasets On-Demand
Precision at Scale: Domain-Specific Datasets On-Demand
Jesús M. Rodríguez-de-Vera
Imanol G. Estepa
Ignacio Sarasúa
Bhalaji Nagarajan
P. Radeva
36
2
0
03 Jul 2024
Advanced Smart City Monitoring: Real-Time Identification of Indian
  Citizen Attributes
Advanced Smart City Monitoring: Real-Time Identification of Indian Citizen Attributes
Shubham Kale
Shashank Sharma
Abhilash Khuntia
18
0
0
03 Jul 2024
Towards Multimodal Open-Set Domain Generalization and Adaptation through
  Self-supervision
Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
Hao Dong
Eleni Chatzi
Olga Fink
26
3
0
01 Jul 2024
Mask and Compress: Efficient Skeleton-based Action Recognition in
  Continual Learning
Mask and Compress: Efficient Skeleton-based Action Recognition in Continual Learning
Matteo Mosconi
Andriy Sorokin
Aniello Panariello
Angelo Porrello
Jacopo Bonato
Marco Cotogni
Luigi Sabetta
Simone Calderara
Rita Cucchiara
CLL
34
1
0
01 Jul 2024
Learning Unsupervised Gaze Representation via Eye Mask Driven
  Information Bottleneck
Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck
Yangzhou Jiang
Yinxin Lin
Yaoming Wang
Teng Li
Bilian Ke
Bingbing Ni
CVBM
40
1
0
29 Jun 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
M. Zhang
Tat-Seng Chua
Shuicheng Yan
AI4TS
47
40
0
27 Jun 2024
WV-Net: A foundation model for SAR WV-mode satellite imagery trained
  using contrastive self-supervised learning on 10 million images
WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images
Yannik Glaser
J. Stopa
Linnea M. Wolniewicz
Ralph Foster
Doug Vandemark
A. Mouche
Bertrand Chapron
Peter Sadowski
25
1
0
26 Jun 2024
Foundational Models for Pathology and Endoscopy Images: Application for
  Gastric Inflammation
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
H. Kerdegari
Kyle Higgins
Dennis Veselkov
I. Laponogov
I. Poļaka
...
Junior Andrea Pescino
M. Leja
M. Dinis-Ribeiro
T. F. Kanonnikoff
Kirill Veselkov
35
3
0
26 Jun 2024
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation
  Model
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
Zhuo Zheng
Stefano Ermon
Dongjun Kim
Liangpei Zhang
Yanfei Zhong
DiffM
45
20
0
26 Jun 2024
Unified Auto-Encoding with Masked Diffusion
Unified Auto-Encoding with Masked Diffusion
Philippe Hansen-Estruch
S. Vishwanath
Amy Zhang
Manan Tomar
DiffM
60
1
0
25 Jun 2024
Investigating Self-Supervised Methods for Label-Efficient Learning
Investigating Self-Supervised Methods for Label-Efficient Learning
S. Nandam
Sara Atito
Zhenhua Feng
Josef Kittler
Muhammad Awais
VLM
42
0
0
25 Jun 2024
Pseudo Labelling for Enhanced Masked Autoencoders
Pseudo Labelling for Enhanced Masked Autoencoders
S. Nandam
Sara Atito
Zhenhua Feng
Josef Kittler
Muhammad Awais
64
1
0
25 Jun 2024
GMT: Guided Mask Transformer for Leaf Instance Segmentation
GMT: Guided Mask Transformer for Leaf Instance Segmentation
Feng Chen
Sotirios A. Tsaftaris
M. Giuffrida
20
1
0
24 Jun 2024
BrainMAE: A Region-aware Self-supervised Learning Framework for Brain
  Signals
BrainMAE: A Region-aware Self-supervised Learning Framework for Brain Signals
Yifan Yang
Yutong Mao
Xufu Liu
Xiao Liu
32
1
0
24 Jun 2024
LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery
LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery
Xiaowen Ma
Rongrong Lian
Zhenkai Wu
Hongbo Guo
Mengting Ma
Sensen Wu
Zhenhong Du
Siyang Song
Wei Zhang
44
4
0
24 Jun 2024
Rethinking Remote Sensing Change Detection With A Mask View
Rethinking Remote Sensing Change Detection With A Mask View
Xiaowen Ma
Zhenkai Wu
Rongrong Lian
Wei Zhang
Siyang Song
29
3
0
21 Jun 2024
Revealing Vision-Language Integration in the Brain with Multimodal
  Networks
Revealing Vision-Language Integration in the Brain with Multimodal Networks
Vighnesh Subramaniam
C. Conwell
Christopher Wang
Gabriel Kreiman
Boris Katz
Ignacio Cases
Andrei Barbu
35
8
0
20 Jun 2024
Semantic Graph Consistency: Going Beyond Patches for Regularizing
  Self-Supervised Vision Transformers
Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers
Chaitanya Devaguptapu
Sumukh K. Aithal
Shrinivas Ramasubramanian
Moyuru Yamada
Manohar Kaul
ViT
34
0
0
18 Jun 2024
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
Tianren Ma
Lingxi Xie
Yunjie Tian
Boyu Yang
Yuan Zhang
42
0
0
17 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Bo Du
Dacheng Tao
Liangpei Zhang
61
25
0
17 Jun 2024
SemanticMIM: Marring Masked Image Modeling with Semantics Compression
  for General Visual Representation
SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation
Yike Yuan
Huanzhang Dou
Fengjun Guo
Xi Li
36
2
0
15 Jun 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic
  Segmentation with Plain Vision Transformers
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViT
FedML
48
3
0
14 Jun 2024
Exploring the Benefits of Vision Foundation Models for Unsupervised
  Domain Adaptation
Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation
B. B. Englert
Fabrizio J. Piva
Tommie Kerssies
Daan de Geus
Gijs Dubbelman
29
10
0
14 Jun 2024
Cross-view geo-localization: a survey
Cross-view geo-localization: a survey
Abhilash Durgam
Sidike Paheding
Vikas Dhiman
Vijay Devabhaktuni
ObjD
29
2
0
14 Jun 2024
Depth Anything V2
Depth Anything V2
Lihe Yang
Bingyi Kang
Zilong Huang
Zhen Zhao
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
DiffM
VLM
MDE
59
323
0
13 Jun 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
38
14
0
13 Jun 2024
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
Hector A. Valdez
Kyle Min
Subarna Tripathi
VLM
39
1
0
13 Jun 2024
Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging
  Masked Predicted Auto-Encoder and Divergence Learning
Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning
Zhongao Sun
Jiameng Li
Yuhan Wang
Jiarong Cheng
Qing Zhou
Chun Li
MedIm
28
0
0
12 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
48
2
0
12 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
48
81
0
11 Jun 2024
Image and Video Tokenization with Binary Spherical Quantization
Image and Video Tokenization with Binary Spherical Quantization
Yue Zhao
Yuanjun Xiong
Philipp Krahenbuhl
39
17
0
11 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent
  Compression Learning
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
41
4
0
11 Jun 2024
Autoregressive Pretraining with Mamba in Vision
Autoregressive Pretraining with Mamba in Vision
Sucheng Ren
Xianhang Li
Haoqin Tu
Feng Wang
Fangxun Shu
...
L. Yang
Peng Wang
Heng Wang
Alan Yuille
Cihang Xie
Mamba
70
9
0
11 Jun 2024
Previous
123...678...343536
Next