ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08254
  4. Cited By
BEiT: BERT Pre-Training of Image Transformers

BEiT: BERT Pre-Training of Image Transformers

15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
    ViT
ArXivPDFHTML

Papers citing "BEiT: BERT Pre-Training of Image Transformers"

50 / 1,790 papers shown
Title
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote
  Sensing Image Understanding
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding
Run Shao
Zhaoyang Zhang
Chao Tao
Yunsheng Zhang
Chengli Peng
Haifeng Li
VLM
35
5
0
27 Mar 2024
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and
  Time-Series Analysis
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis
Badri N. Patro
Suhas Ranganath
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
43
2
0
26 Mar 2024
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël
Renaud Vandeghen
A. Cioppa
Silvio Giancola
Bernard Ghanem
Marc Van Droogenbroeck
ViT
43
6
0
26 Mar 2024
SD-DiT: Unleashing the Power of Self-supervised Discrimination in
  Diffusion Transformer
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu
Yingwei Pan
Yehao Li
Ting Yao
Zhenglong Sun
Tao Mei
C. Chen
50
24
0
25 Mar 2024
Adversarially Masked Video Consistency for Unsupervised Domain
  Adaptation
Adversarially Masked Video Consistency for Unsupervised Domain Adaptation
Xiaoyu Zhu
Junwei Liang
Po-Yao Huang
Alex Hauptmann
32
1
0
24 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient
  Transfer Learning for Multi-modal Large Language Models
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
46
1
0
22 Mar 2024
Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive
  Segmentation
Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation
Wenlve Zhou
Zhiheng Zhou
Tianlei Wang
Delu Zeng
35
0
0
22 Mar 2024
Training point-based deep learning networks for forest segmentation with
  synthetic data
Training point-based deep learning networks for forest segmentation with synthetic data
Francisco Raverta Capua
Juan Schandin
Pablo De Cristóforis
3DPC
38
3
0
21 Mar 2024
On Pretraining Data Diversity for Self-Supervised Learning
On Pretraining Data Diversity for Self-Supervised Learning
Hasan Hammoud
Tuhin Das
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
103
2
0
20 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task
  Pretraining
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
L. Zhang
37
44
0
20 Mar 2024
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
Yuehao Song
Xinggang Wang
Jingfeng Yao
Wenyu Liu
Jinglin Zhang
Xiangmin Xu
ViT
49
2
0
19 Mar 2024
GenView: Enhancing View Quality with Pretrained Generative Model for
  Self-Supervised Learning
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
Xiaojie Li
Yibo Yang
Xiangtai Li
Jianlong Wu
Yue Yu
Bernard Ghanem
Min Zhang
SSL
34
6
0
18 Mar 2024
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT
  Adaptation
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Wangbo Zhao
Jiasheng Tang
Yizeng Han
Yibing Song
Kai Wang
Gao Huang
F. Wang
Yang You
40
11
0
18 Mar 2024
Domain-Guided Masked Autoencoders for Unique Player Identification
Domain-Guided Masked Autoencoders for Unique Player Identification
Bavesh Balaji
Jerrin Bright
Sirisha Rambhatla
Yuhao Chen
Alexander Wong
John S. Zelek
David A Clausi
19
1
0
17 Mar 2024
Rethinking Multi-view Representation Learning via Distilled
  Disentangling
Rethinking Multi-view Representation Learning via Distilled Disentangling
Guanzhou Ke
Bo Wang
Xiaoli Wang
Shengfeng He
37
3
0
16 Mar 2024
Codebook Transfer with Part-of-Speech for Vector-Quantized Image
  Modeling
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang
Huaibin Wang
Chuyao Luo
Xutao Li
Guotao Liang
Yunming Ye
Xiaochen Qi
Yao He
37
11
0
15 Mar 2024
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Xiaohuan Pei
Tao Huang
Chang Xu
Mamba
27
88
0
15 Mar 2024
OneTracker: Unifying Visual Object Tracking with Foundation Models and
  Efficient Tuning
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong
Shilin Yan
Renrui Zhang
Wanyun Li
Xinyu Zhou
...
Kaixun Jiang
Yiting Chen
Jinglun Li
Zhaoyu Chen
Wenqiang Zhang
VLM
34
38
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with
  Unsupervised Audio Mixtures
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
Afrina Tabassum
Dung N. Tran
Trung D. Q. Dang
Ismini Lourentzou
K. Koishida
47
0
0
14 Mar 2024
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning
  Researchers
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
Kaichao You
Runsheng Bai
Meng Cao
Jianmin Wang
Ion Stoica
Mingsheng Long
VLM
33
0
0
14 Mar 2024
Faceptor: A Generalist Model for Face Perception
Faceptor: A Generalist Model for Face Perception
Lixiong Qin
Mei Wang
Xuannan Liu
Yuhang Zhang
Weihong Deng
Xiaoshuai Song
Weiran Xu
Weihong Deng
CVBM
32
6
0
14 Mar 2024
Explore In-Context Segmentation via Latent Diffusion Models
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
63
6
0
14 Mar 2024
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving
  Representation Learning
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
46
2
0
13 Mar 2024
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive
  Self-Supervised Transformers
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
Shahaf Arica
Or Rubin
Sapir Gershov
S. Laufer
29
6
0
12 Mar 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu
Jiaxing Huang
Peng Gao
Lewei Lu
Xiaoqin Zhang
Shijian Lu
48
4
0
12 Mar 2024
AACP: Aesthetics assessment of children's paintings based on
  self-supervised learning
AACP: Aesthetics assessment of children's paintings based on self-supervised learning
Shiqi Jiang
Ning Li
Chen Shi
Liping Guo
Changbo Wang
Chenhui Li
30
0
0
12 Mar 2024
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature
  Interaction for Dense Predictions
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia
Xinliang Wang
Feng Lv
Xin Hao
Yifeng Shi
ViT
31
45
0
12 Mar 2024
Noise-powered Multi-modal Knowledge Graph Representation Framework
Noise-powered Multi-modal Knowledge Graph Representation Framework
Zhuo Chen
Yin Fang
Yichi Zhang
Lingbing Guo
Jiaoyan Chen
Hua-zeng Chen
Wen Zhang
Wen Zhang
26
0
0
11 Mar 2024
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models
Philip Harris
Michael Kagan
J. Krupa
B. Maier
Nathaniel Woodward
38
13
0
11 Mar 2024
$\textbf{S}^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM
  for Time Series Forecasting
S2\textbf{S}^2S2IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting
Zijie Pan
Yushan Jiang
Sahil Garg
Anderson Schneider
Yuriy Nevmyvaka
Dongjin Song
AI4TS
47
6
0
09 Mar 2024
OmniJet-$α$: The first cross-task foundation model for particle
  physics
OmniJet-ααα: The first cross-task foundation model for particle physics
Joschka Birk
Anna Hallin
Gregor Kasieczka
AI4CE
35
22
0
08 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
40
10
0
08 Mar 2024
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Liting Lin
Heng Fan
Zhipeng Zhang
Yaowei Wang
Yong-mei Xu
Haibin Ling
44
24
0
08 Mar 2024
UniTable: Towards a Unified Framework for Table Recognition via
  Self-Supervised Pretraining
UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining
Sheng-Hsuan Peng
Aishwarya Chakravarthy
Seongmin Lee
Xiaojing Wang
Rajarajeswari Balasubramaniyan
Duen Horng Chau
LMTD
44
0
0
07 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
37
12
0
06 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
67
12
0
05 Mar 2024
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV &
  CribsTV
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV
Jaime Spencer
Chris Russell
Simon Hadfield
Richard Bowden
MDE
43
7
0
03 Mar 2024
Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection
Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection
Chenchen Tao
Chong Wang
Yuexian Zou
Xiaohao Peng
Jiafei Wu
Jiangbo Qian
34
2
0
02 Mar 2024
Rethinking cluster-conditioned diffusion models
Rethinking cluster-conditioned diffusion models
Nikolas Adaloglou
Tim Kaiser
Félix D. P. Michels
M. Kollmann
VLM
37
3
0
01 Mar 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu
Jianlin Su
Bo-Wen Zhang
Chunhua Shen
MLLM
44
10
0
01 Mar 2024
Learning and Leveraging World Models in Visual Representation Learning
Learning and Leveraging World Models in Visual Representation Learning
Q. Garrido
Mahmoud Assran
Nicolas Ballas
Adrien Bardes
Laurent Najman
Yann LeCun
SSL
46
24
0
01 Mar 2024
Data-efficient Event Camera Pre-training via Disentangled Masked
  Modeling
Data-efficient Event Camera Pre-training via Disentangled Masked Modeling
Zhenpeng Huang
Chao Li
Hao Chen
Yongjian Deng
Yifeng Geng
Limin Wang
45
2
0
01 Mar 2024
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text
  Detection and Spotting
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan
Pei Fu
Shan Guo
Qianyi Jiang
Xiaoming Wei
VLM
46
5
0
01 Mar 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language
  Pre-training
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
VLM
35
0
0
01 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
86
178
0
29 Feb 2024
MaskFi: Unsupervised Learning of WiFi and Vision Representations for
  Multimodal Human Activity Recognition
MaskFi: Unsupervised Learning of WiFi and Vision Representations for Multimodal Human Activity Recognition
Jianfei Yang
Shijie Tang
Yuecong Xu
Yunjiao Zhou
Lihua Xie
27
4
0
29 Feb 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei
Tao Chen
XiRuo Jiang
Huafeng Liu
Zeren Sun
Yazhou Yao
VGen
39
9
0
29 Feb 2024
SwitchLight: Co-design of Physics-driven Architecture and Pre-training
  Framework for Human Portrait Relighting
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim
Minje Jang
Wonjun Yoon
Jisoo Lee
Donghyun Na
Sanghyun Woo
AI4CE
36
19
0
29 Feb 2024
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Jiequan Cui
Beier Zhu
Xin Wen
Xiaojuan Qi
Bei Yu
Hanwang Zhang
25
7
0
28 Feb 2024
Previous
123...91011...343536
Next