Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (227★)
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 860 papers shown
Optimizing Vision Transformers for Medical Image Segmentation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qianying Liu
Chaitanya Kaul
Jun Wang
Christos Anagnostopoulos
Roderick Murray-Smith
Fani Deligianni
ViT
MedIm
266
38
0
14 Oct 2022
MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images
Fusion (FUSION), 2022
Weiming Li
Lihui Xue
Xueqian Wang
Gang Li
ViT
213
22
0
14 Oct 2022
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Neural Information Processing Systems (NeurIPS), 2022
Hyeong Kyu Choi
Joonmyung Choi
Hyunwoo J. Kim
ViT
241
42
0
14 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
British Machine Vision Conference (BMVC), 2022
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
214
64
0
13 Oct 2022
FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis via Stacked Transformers
Pattern Recognition (Pattern Recogn.), 2022
Yitian Liu
Zheng Lian
374
18
0
12 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Neural Information Processing Systems (NeurIPS), 2022
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
267
85
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
138
20
0
11 Oct 2022
Coded Residual Transform for Generalizable Deep Metric Learning
Neural Information Processing Systems (NeurIPS), 2022
Shichao Kan
Yixiong Liang
Min Li
Yigang Cen
Jianxin Wang
Z. He
267
2
0
09 Oct 2022
Flexible Alignment Super-Resolution Network for Multi-Contrast MRI
Yiming Liu
Mengxi Zhang
Weiqin Zhang
Bo Jiang
Bo Hou
Dan Liu
Jie Chen
Heqing Lian
MedIm
186
4
0
07 Oct 2022
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Neural Information Processing Systems (NeurIPS), 2022
Yen-Cheng Liu
Chih-Yao Ma
Junjiao Tian
Zijian He
Z. Kira
283
64
0
07 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
International Conference on Learning Representations (ICLR), 2022
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
325
78
0
04 Oct 2022
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling
Yunsung Lee
Gyuseong Lee
Kwang-seok Ryoo
Hyojun Go
Jihye Park
Seung Wook Kim
139
5
0
04 Oct 2022
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Erkang Chen
ViT
127
17
0
03 Oct 2022
Attention Distillation: self-supervised vision transformer students need more guidance
British Machine Vision Conference (BMVC), 2022
Kai Wang
Fei Yang
Joost van de Weijer
ViT
162
21
0
03 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Spoken Language Technology Workshop (SLT), 2022
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
408
162
0
30 Sep 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
317
143
0
30 Sep 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
161
7
0
29 Sep 2022
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Xiangcheng Liu
Tianyi Wu
Guodong Guo
ViT
213
45
0
28 Sep 2022
Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Neelu Madan
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Kamal Nasrollahi
Fahad Shahbaz Khan
T. Moeslund
M. Shah
ViT
MedIm
555
102
0
25 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
IEEE Transactions on Image Processing (IEEE TIP), 2022
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
382
29
0
21 Sep 2022
Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions
Dong Zhang
Yi Lin
Hao Chen
Zhuotao Tian
Xin Yang
Jinhui Tang
Kwang-Ting Cheng
VLM
287
19
0
21 Sep 2022
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
Hubert Leterme
K. Polisano
V. Perrier
Alahari Karteek
FAtt
497
2
0
19 Sep 2022
A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning Architectures for Mosquito Larvae Classification
Aswin Surya
David B. Peral
Austin VanLoon
A. Rajesh
MedIm
50
5
0
16 Sep 2022
Transformer based Fingerprint Feature Extraction
International Conference on Pattern Recognition (ICPR), 2022
Saraansh Tandon
A. Namboodiri
ViT
210
12
0
08 Sep 2022
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations
International Conference on 3D Vision (3DV), 2022
Vadim Tschernezki
Iro Laina
Diane Larlus
Andrea Vedaldi
530
232
0
07 Sep 2022
Fusion of Satellite Images and Weather Data with Transformer Networks for Downy Mildew Disease Detection
IEEE Access (IEEE Access), 2022
William Maillet
Maryam Ouhami
A. Hafiane
ViT
MedIm
124
11
0
06 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLM
SSL
144
2
0
06 Sep 2022
ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative Transformer
ACM Multimedia (ACM MM), 2022
Jiaqi Ma
Shengyuan Yan
Guang Dai
Guoli Wang
Qian Zhang
164
10
0
31 Aug 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Neurocomputing (Neurocomputing), 2022
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
162
23
0
31 Aug 2022
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
244
2
0
30 Aug 2022
Adaptive Perception Transformer for Temporal Action Localization
Yizheng Ouyang
Tianjin Zhang
Weibo Gu
Hongfa Wang
240
3
0
25 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Mocho Go
Hideyuki Tachibana
ViT
171
11
0
24 Aug 2022
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
218
1
0
23 Aug 2022
FocusFormer: Focusing on What We Need via Architecture Sampler
Jing Liu
Jianfei Cai
Bohan Zhuang
162
9
0
23 Aug 2022
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
Jingyu Lin
Jie Jiang
Y. Yan
Chunchao Guo
Hongfa Wang
Wei Liu
Hanzi Wang
ViT
142
3
0
21 Aug 2022
Improved Image Classification with Token Fusion
IEEE Access (IEEE Access), 2022
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
183
0
0
19 Aug 2022
Learning Spatial-Frequency Transformer for Visual Object Tracking
Ju Huang
Tianlin Li
Yuanchao Bai
Zhe Wu
Jianlin Zhang
Yongmei Huang
ViT
337
73
0
18 Aug 2022
Conviformers: Convolutionally guided Vision Transformer
Mohit Vaishnav
Thomas Fel
I. F. Rodriguez
Thomas Serre
ViT
308
2
0
17 Aug 2022
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Xiulong Yang
Sheng-Min Shih
Yinlin Fu
Xiaoting Zhao
Shihao Ji
DiffM
263
62
0
16 Aug 2022
Flow-Guided Transformer for Video Inpainting
European Conference on Computer Vision (ECCV), 2022
Kaiwen Zhang
Jingjing Fu
Dong Liu
ViT
218
101
0
14 Aug 2022
Class-attention Video Transformer for Engagement Intensity Prediction
Xusheng Ai
Victor S. Sheng
Chunhua Li
Zhiming Cui
ViT
142
12
0
12 Aug 2022
Deep is a Luxury We Don't Have
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
173
3
0
11 Aug 2022
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
232
13
0
04 Aug 2022
Maintaining Performance with Less Data
Dominic Sanderson
Tatiana Kalgonova
260
1
0
03 Aug 2022
Global-Local Self-Distillation for Visual Representation Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tim Lebailly
Tinne Tuytelaars
SSL
138
6
0
29 Jul 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Neural Information Processing Systems (NeurIPS), 2022
Yongming Rao
Wenliang Zhao
Yansong Tang
Jie Zhou
Ser-Nam Lim
Jiwen Lu
ViT
444
338
0
28 Jul 2022
DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer
Knowledge-Based Systems (KBS), 2022
Hao Li
Zhijing Yang
Xiaobin Hong
Ziying Zhao
Junyang Chen
Yukai Shi
Jin-shan Pan
DiffM
ViT
168
17
0
28 Jul 2022
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
European Conference on Computer Vision (ECCV), 2022
Cong Wang
Hongmin Xu
Xiong Zhang
Li Wang
Zhitong Zheng
Haifeng Liu
ViT
111
29
0
27 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
European Conference on Computer Vision (ECCV), 2022
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
225
56
0
26 Jul 2022
Self-Distilled Vision Transformer for Domain Generalization
Asian Conference on Computer Vision (ACCV), 2022
M. Sultana
Muzammal Naseer
Muhammad Haris Khan
Salman Khan
Fahad Shahbaz Khan
ViT
300
52
0
25 Jul 2022
Previous
1
2
3
...
10
11
12
...
16
17
18
Next
Page 11 of 18
Page
of 18
Go