Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.08254
Cited By
BEiT: BERT Pre-Training of Image Transformers
15 June 2021
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BEiT: BERT Pre-Training of Image Transformers"
50 / 1,790 papers shown
Title
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization
Han Guo
Ramtin Hosseini
Ruiyi Zhang
Sai Ashish Somayajula
Ranak Roy Chowdhury
Rajesh K. Gupta
Pengtao Xie
33
0
0
28 Feb 2024
Vision Transformers with Natural Language Semantics
Young-Kyung Kim
Matías Di Martino
Guillermo Sapiro
ViT
23
5
0
27 Feb 2024
LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning
Shentong Mo
Yansen Wang
Xufang Luo
Dongsheng Li
VLM
38
1
0
27 Feb 2024
PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling
Ruizhe Zhong
Junjie Ye
Zhentao Tang
Shixiong Kai
Mingxuan Yuan
Jianye Hao
Junchi Yan
42
8
0
27 Feb 2024
Self-Supervised Pre-Training for Table Structure Recognition Transformer
Sheng-Hsuan Peng
Seongmin Lee
Xiaojing Wang
Rajarajeswari Balasubramaniyan
Duen Horng Chau
ViT
LMTD
44
0
0
23 Feb 2024
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
Abolfazl Younesi
Mohsen Ansari
Mohammadamin Fazli
A. Ejlali
Muhammad Shafique
Joerg Henkel
3DV
47
44
0
23 Feb 2024
Attention-Guided Masked Autoencoders For Learning Image Representations
Leon Sick
Dominik Engel
Pedro Hermosilla
Timo Ropinski
34
1
0
23 Feb 2024
Label-efficient multi-organ segmentation with a diffusion model
Yongzhi Huang
Jinxin Zhu
Haseeb Hassan
Liyilei Su
Jingyu Li
Binding Huang
Yun Peng
Jingyu Li
Jun Ma
Bingding Huang
DiffM
MedIm
36
0
0
23 Feb 2024
The Common Stability Mechanism behind most Self-Supervised Learning Approaches
Abhishek Jha
Matthew B. Blaschko
Yuki M. Asano
Tinne Tuytelaars
SSL
32
1
0
22 Feb 2024
Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion
Yichi Zhang
Zhuo Chen
Lei Liang
Hua-zeng Chen
Wen Zhang
53
4
0
22 Feb 2024
Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
Yu-Qi Yang
Yufeng Guo
Yang Liu
3DPC
46
2
0
22 Feb 2024
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang
I-Hau Yeh
Hongpeng Liao
57
1,151
0
21 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
41
29
0
20 Feb 2024
LVCHAT: Facilitating Long Video Comprehension
Yu-Xiang Wang
Zeyuan Zhang
Julian McAuley
Zexue He
VLM
32
4
0
19 Feb 2024
Surround-View Fisheye Optics in Computer Vision and Simulation: Survey and Challenges
Daniel Jakab
B. Deegan
Sushil Sharma
E. Grua
Jonathan Horgan
Enda Ward
Pepijn Van De Ven
Anthony G. Scanlan
Ciarán Eising
32
9
0
19 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
92
73
0
15 Feb 2024
Learning Low-Rank Feature for Thorax Disease Classification
Rajeev Goel
Utkarsh Nath
Yancheng Wang
Alvin C. Silva
Teresa Wu
Yingzhen Yang
22
0
0
14 Feb 2024
Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning
Amir Ziai
Aneesh Vartakavi
VLM
VGen
32
0
0
09 Feb 2024
You've Got to Feel It To Believe It: Multi-Modal Bayesian Inference for Semantic and Property Prediction
Parker Ewen
Hao Chen
Yuzhen Chen
Anran Li
Anup Bagali
Gitesh Gunjal
Ram Vasudevan
28
5
0
08 Feb 2024
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts
Zhili Liu
Kai Chen
Jianhua Han
Lanqing Hong
Hang Xu
Zhenguo Li
James T. Kwok
MoE
111
24
0
08 Feb 2024
Attention as Robust Representation for Time Series Forecasting
Peisong Niu
Tian Zhou
Xue Wang
Liang Sun
Rong Jin
AI4TS
19
4
0
08 Feb 2024
Data-efficient Large Vision Models through Sequential Autoregression
Jianyuan Guo
Zhiwei Hao
Chengcheng Wang
Yehui Tang
Han Wu
Han Hu
Kai Han
Chang Xu
VLM
38
10
0
07 Feb 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLM
CLIP
MLLM
94
41
0
06 Feb 2024
MOMENT: A Family of Open Time-series Foundation Models
Mononito Goswami
Konrad Szafer
Arjun Choudhry
Yifu Cai
Shuo Li
Artur Dubrawski
AIFin
AI4TS
71
111
0
06 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
51
13
0
05 Feb 2024
Can Large Language Models Learn Independent Causal Mechanisms?
Gael Gendron
Bao Trung Nguyen
A. Peng
Michael Witbrock
Gillian Dobbie
LRM
28
3
0
04 Feb 2024
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Quang-Cuong Pham
Giang Do
Huy Nguyen
TrungTin Nguyen
Chenghao Liu
...
Binh T. Nguyen
Savitha Ramasamy
Xiaoli Li
Steven C. H. Hoi
Nhat Ho
25
17
0
04 Feb 2024
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Haoyi Zhu
Yating Wang
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
SSL
3DPC
44
20
0
04 Feb 2024
Deep Spectral Improvement for Unsupervised Image Instance Segmentation
Farnoosh Arefi
Amir M. Mansourian
S. Kasaei
ISeg
31
1
0
04 Feb 2024
Timer: Generative Pre-trained Transformers Are Large Time Series Models
Yong Liu
Haoran Zhang
Chenyu Li
Xiangdong Huang
Jianmin Wang
Mingsheng Long
AIFin
AI4TS
AI4CE
36
48
0
04 Feb 2024
Scale Equalization for Multi-Level Feature Fusion
Bum Jun Kim
Sang Woo Kim
11
1
0
02 Feb 2024
Interpretation of Intracardiac Electrograms Through Textual Representations
William Jongwon Han
Diana Gomez
Avi Alok
Chaojing Duan
Michael A. Rosenberg
Douglas Weber
Emerson Liu
Ding Zhao
26
1
0
02 Feb 2024
Towards Visual Syntactical Understanding
Sayeed Shafayet Chowdhury
Soumyadeep Chandra
Kaushik Roy
NAI
32
0
0
30 Jan 2024
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors
Shiyin Dong
Mingrui Zhu
Kun Cheng
Nannan Wang
Xinbo Gao
DiffM
30
3
0
29 Jan 2024
Masked Audio Modeling with CLAP and Multi-Objective Learning
Yifei Xin
Xiulian Peng
Yan Lu
49
8
0
29 Jan 2024
Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
Shaeke Salman
M. Shams
Xiuwen Liu
29
6
0
28 Jan 2024
Masked Pre-trained Model Enables Universal Zero-shot Denoiser
Xiaoxiao Ma
Zhixiang Wei
Yi Jin
Pengyang Ling
Tianle Liu
Ben Wang
Junkang Dai
H. Chen
Enhong Chen
VLM
43
2
0
26 Jan 2024
GeoDecoder: Empowering Multimodal Map Understanding
Feng Qi
Mian Dai
Zixian Zheng
Chao Wang
37
1
0
26 Jan 2024
Producing Plankton Classifiers that are Robust to Dataset Shift
Cheng Chen
S. Kyathanahally
Marta Reyes
Stefanie Merkli
E. Merz
Emanuele Francazi
Marvin Hoege
F. Pomati
Marco Baity-Jesi
23
2
0
25 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
34
14
0
25 Jan 2024
Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models
T. Golling
Lukas Heinrich
Michael Kagan
Samuel Klein
Matthew Leigh
Margarita Osadchy
J. A. Raine
20
24
0
24 Jan 2024
Finetuning Foundation Models for Joint Analysis Optimization
M. Vigl
N. Hartman
L. Heinrich
43
12
0
24 Jan 2024
OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease Detection
Fatema Jannat
Sina Gholami
Minha Alam
Hamed Tabkhi
16
1
0
22 Jan 2024
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
VLM
21
0
0
22 Jan 2024
Anisotropy Is Inherent to Self-Attention in Transformers
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
15
16
0
22 Jan 2024
Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification
Jimmy Lin
Junkai Li
Jiasi Gao
Weizhi Ma
Yang Liu
20
0
0
21 Jan 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
155
709
0
19 Jan 2024
LDReg: Local Dimensionality Regularized Self-Supervised Learning
Hanxun Huang
R. Campello
S. Erfani
Xingjun Ma
Michael E. Houle
James Bailey
38
5
0
19 Jan 2024
Exploring scalable medical image encoders beyond text supervision
Fernando Pérez-García
Harshita Sharma
Sam Bond-Taylor
Kenza Bouzid
Valentina Salvatelli
...
Maria T. A. Wetscherek
Noel C. F. Codella
Stephanie L. Hyland
Javier Alvarez-Valle
Ozan Oktay
LM&MA
MedIm
50
26
0
19 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
24
0
0
18 Jan 2024
Previous
1
2
3
...
10
11
12
...
34
35
36
Next