ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08254
  4. Cited By
BEiT: BERT Pre-Training of Image Transformers

BEiT: BERT Pre-Training of Image Transformers

15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
    ViT
ArXivPDFHTML

Papers citing "BEiT: BERT Pre-Training of Image Transformers"

50 / 1,788 papers shown
Title
Multimodal Perception System for Real Open Environment
Multimodal Perception System for Real Open Environment
Yuyang Sha
21
0
0
10 Oct 2024
Self-Supervised Learning for Real-World Object Detection: a Survey
Self-Supervised Learning for Real-World Object Detection: a Survey
Alina Ciocarlan
Sidonie Lefebvre
S. L. Hégarat-Mascle
Arnaud Woiselle
ObjD
34
0
0
09 Oct 2024
MaskBlur: Spatial and Angular Data Augmentation for Light Field Image
  Super-Resolution
MaskBlur: Spatial and Angular Data Augmentation for Light Field Image Super-Resolution
Wentao Chao
Fuqing Duan
Yulan Guo
Guanghui Wang
32
1
0
09 Oct 2024
Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers
Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers
Stephen Hausler
Peyman Moghadam
SSL
ViT
29
2
0
09 Oct 2024
A Deep Learning-Based Approach for Mangrove Monitoring
A Deep Learning-Based Approach for Mangrove Monitoring
Lucas José Velôso de Souza
Ingrid Valverde Reis Zreik
Adrien Salem-Sermanet
Nacéra Seghouani
Lionel Pourchier
18
0
0
07 Oct 2024
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image
  Classification
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification
Benjamin Feuer
Jiawei Xu
Niv Cohen
Patrick Yubeaton
Govind Mittal
Chinmay Hegde
21
1
0
07 Oct 2024
Adaptive Masking Enhances Visual Grounding
Adaptive Masking Enhances Visual Grounding
Sen Jia
Lei Li
26
0
0
04 Oct 2024
Predictive Coding for Decision Transformer
Predictive Coding for Decision Transformer
T. Luu
Donghoon Lee
Chang D. Yoo
OffRL
58
1
0
04 Oct 2024
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
Adrian Chan
Anupam Mijar
Mehreen Saeed
Chau-Wai Wong
Akram Khater
36
0
0
03 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
30
0
0
03 Oct 2024
TAEGAN: Generating Synthetic Tabular Data For Data Augmentation
TAEGAN: Generating Synthetic Tabular Data For Data Augmentation
Jiayu Li
Zilong Zhao
Kevin Yee
Uzair Javaid
Biplab Sikdar
LMTD
31
1
0
02 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
ImageFolder: Autoregressive Image Generation with Folded Tokens
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe-nan Lin
VLM
34
18
0
02 Oct 2024
COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based
  Video Generation
COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Mingzhen Sun
Weining Wang
Xinxin Zhu
Jing Liu
VGen
DiffM
31
0
0
02 Oct 2024
GADFA: Generator-Assisted Decision-Focused Approach for Opinion
  Expressing Timing Identification
GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification
Chung-Chi Chen
Hiroya Takamura
Ichiro Kobayashi
Yusuke Miyao
22
0
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
52
2
0
02 Oct 2024
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for
  Universal Scene Emotion Perception
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
Chuang Chen
X. Sun
Zhi Liu
31
0
0
27 Sep 2024
Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer
  Classification
Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification
Salma Hassan
Hamad Al Hammadi
Ibrahim Mohammed
Muhammad Haris Khan
37
0
0
27 Sep 2024
You Only Speak Once to See
You Only Speak Once to See
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
33
1
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient
  Object-Aware Pretraining
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
34
0
0
26 Sep 2024
First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating
  Robustness of Vision Foundation Models for Semantic Segmentation
First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation
Tommie Kerssies
Daan de Geus
Gijs Dubbelman
64
2
0
25 Sep 2024
Face Forgery Detection with Elaborate Backbone
Face Forgery Detection with Elaborate Backbone
Zonghui Guo
Y. Liu
Jie Zhang
Haiyong Zheng
Shiguang Shan
AAML
CVBM
25
1
0
25 Sep 2024
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation
  of Visual-knowledge-linguistic Features
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features
Xin Wei
Yaling Tao
Changde Du
Gangming Zhao
Yizhou Yu
Jinpeng Li
33
0
0
24 Sep 2024
Leveraging Text Localization for Scene Text Removal via Text-aware
  Masked Image Modeling
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang
Hongtao Xie
Yuxin Wang
Yadong Qu
Fengjun Guo
Pengwei Liu
DiffM
33
0
0
20 Sep 2024
MEXMA: Token-level objectives improve sentence representations
MEXMA: Token-level objectives improve sentence representations
Joao Maria Janeiro
Benjamin Piwowarski
Patrick Gallinari
Loïc Barrault
26
1
0
19 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
27
7
0
19 Sep 2024
Frequency-Guided Spatial Adaptation for Camouflaged Object Detection
Frequency-Guided Spatial Adaptation for Camouflaged Object Detection
Shizhou Zhang
Dexuan Kong
Yinghui Xing
Yue Lu
Lingyan Ran
Guoqiang Liang
Hexu Wang
Yanning Zhang
30
5
0
19 Sep 2024
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
Zichen Jeff Cui
Hengkai Pan
Aadhithya Iyer
Siddhant Haldar
Lerrel Pinto
VGen
26
10
0
18 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
40
51
0
17 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of
  Modalities
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
32
0
0
17 Sep 2024
A Comparative Study of Open Source Computer Vision Models for
  Application on Small Data: The Case of CFRP Tape Laying
A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying
Thomas Fraunholz
Dennis Rall
Tim Kohler
Alfons Schuster
M. Mayer
Lars Larsen
35
0
0
16 Sep 2024
Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers
Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers
Ruchik Mishra
Andrew Frye
M. M. Rayguru
Dan O. Popa
35
1
0
16 Sep 2024
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi
Mengxi Zhou
Nastaran Karimi Monsefi
Ser-Nam Lim
Wei-Lun Chao
R. Ramnath
44
1
0
16 Sep 2024
SimMAT: Exploring Transferability from Vision Foundation Models to Any
  Image Modality
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Chenyang Lei
Liyi Chen
Jun Cen
Xiao Chen
Zhen Lei
Felix Heide
Ziwei Liu
Qifeng Chen
Zhaoxiang Zhang
44
0
0
12 Sep 2024
Data Collection-free Masked Video Modeling
Data Collection-free Masked Video Modeling
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
19
1
0
10 Sep 2024
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural
  Networks
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks
Teresa Dorszewski
Lenka Tětková
Lorenz Linhardt
Lars Kai Hansen
HAI
36
0
0
10 Sep 2024
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
Nischal Khanal
Shivanand Venkanna Sheshappanavar
MDE
42
0
0
10 Sep 2024
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
Amin Karimi Monsefi
Kishore Prakash Sailaja
Ali Alilooee
Ser-Nam Lim
R. Ramnath
VLM
35
6
0
10 Sep 2024
VFA: Vision Frequency Analysis of Foundation Models and Human
VFA: Vision Frequency Analysis of Foundation Models and Human
Mohammad Javad Darvishi Bayazi
Md Rifat Arefin
Jocelyn Faubert
Irina Rish
VLM
37
1
0
09 Sep 2024
VidLPRO: A $\underline{Vid}$eo-$\underline{L}$anguage
  $\underline{P}$re-training Framework for $\underline{Ro}$botic and
  Laparoscopic Surgery
VidLPRO: A Vid‾\underline{Vid}Vid​eo-L‾\underline{L}L​anguage P‾\underline{P}P​re-training Framework for Ro‾\underline{Ro}Ro​botic and Laparoscopic Surgery
Mohammadmahdi Honarmand
Muhammad Abdullah Jamal
Omid Mohareri
58
1
0
07 Sep 2024
Introducing a Class-Aware Metric for Monocular Depth Estimation: An
  Automotive Perspective
Introducing a Class-Aware Metric for Monocular Depth Estimation: An Automotive Perspective
Tim Bader
Leon Eisemann
Adrian Pogorzelski
Namrata Jangid
Attila B. Kis
38
0
0
06 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
70
15
0
05 Sep 2024
Dual Advancement of Representation Learning and Clustering for Sparse
  and Noisy Images
Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images
Wenlin Li
Yucheng Xu
Xiaoqing Zheng
Suoya Han
Jun Wang
Xiaobo Sun
36
0
0
03 Sep 2024
Learning Task-Specific Sampling Strategy for Sparse-View CT
  Reconstruction
Learning Task-Specific Sampling Strategy for Sparse-View CT Reconstruction
Liutao Yang
Jiahao Huang
Yingying Fang
Angelica I Aviles-Rivero
Carola-Bibiane Schonlieb
Daoqiang Zhang
Guang Yang
30
0
0
03 Sep 2024
SITUATE: Indoor Human Trajectory Prediction through Geometric Features
  and Self-Supervised Vision Representation
SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation
Luigi Capogrosso
Andrea Toaiari
Andrea Avogaro
Uzair Khan
Aditya Jivoji
Franco Fummi
Marco Cristani
31
0
0
01 Sep 2024
Self-Supervised Vision Transformers for Writer Retrieval
Self-Supervised Vision Transformers for Writer Retrieval
Tim Raven
Arthur Matei
Gernot A. Fink
ViT
20
0
0
01 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
M. Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
45
3
0
30 Aug 2024
Adapting Vision-Language Models to Open Classes via Test-Time Prompt
  Tuning
Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning
Zhengqing Gao
Xiang Ao
Xu-Yao Zhang
Cheng-Lin Liu
VLM
VPVLM
34
0
0
29 Aug 2024
MICDrop: Masking Image and Depth Features via Complementary Dropout for
  Domain-Adaptive Semantic Segmentation
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang
Lukas Hoyer
Mark Weber
Tobias Fischer
Dengxin Dai
Laura Leal-Taixé
Marc Pollefeys
Daniel Cremers
Luc Van Gool
MDE
32
3
0
29 Aug 2024
A Preliminary Exploration Towards General Image Restoration
A Preliminary Exploration Towards General Image Restoration
Xiangtao Kong
Jinjin Gu
Yihao Liu
Wenlong Zhang
Xiangyu Chen
Yu Qiao
Chao Dong
DiffM
38
2
0
27 Aug 2024
InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type
  Performance in Indoor Monocular Depth
InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth
Cho-Ying Wu
Quankai Gao
Chin-Cheng Hsu
Te-Lin Wu
Jing-Wen Chen
Ulrich Neumann
MDE
27
0
0
25 Aug 2024
Previous
123456...343536
Next