ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (227★)

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 860 papers shown
Optimizing Vision Transformers for Medical Image Segmentation
Optimizing Vision Transformers for Medical Image SegmentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qianying Liu
Chaitanya Kaul
Jun Wang
Christos Anagnostopoulos
Roderick Murray-Smith
Fani Deligianni
ViTMedIm
266
38
0
14 Oct 2022
MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
  Optical Remote Sensing Images
MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing ImagesFusion (FUSION), 2022
Weiming Li
Lihui Xue
Xueqian Wang
Gang Li
ViT
213
22
0
14 Oct 2022
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for
  Transformers
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for TransformersNeural Information Processing Systems (NeurIPS), 2022
Hyeong Kyu Choi
Joonmyung Choi
Hyunwoo J. Kim
ViT
241
42
0
14 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
How to Train Vision Transformer on Small-scale Datasets?British Machine Vision Conference (BMVC), 2022
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
214
64
0
13 Oct 2022
FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis
  via Stacked Transformers
FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis via Stacked TransformersPattern Recognition (Pattern Recogn.), 2022
Yitian Liu
Zheng Lian
374
18
0
12 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural
  Networks on Small Datasets
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small DatasetsNeural Information Processing Systems (NeurIPS), 2022
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
267
85
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
138
20
0
11 Oct 2022
Coded Residual Transform for Generalizable Deep Metric Learning
Coded Residual Transform for Generalizable Deep Metric LearningNeural Information Processing Systems (NeurIPS), 2022
Shichao Kan
Yixiong Liang
Min Li
Yigang Cen
Jianxin Wang
Z. He
267
2
0
09 Oct 2022
Flexible Alignment Super-Resolution Network for Multi-Contrast MRI
Flexible Alignment Super-Resolution Network for Multi-Contrast MRI
Yiming Liu
Mengxi Zhang
Weiqin Zhang
Bo Jiang
Bo Hou
Dan Liu
Jie Chen
Heqing Lian
MedIm
186
4
0
07 Oct 2022
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
  Tasks
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision TasksNeural Information Processing Systems (NeurIPS), 2022
Yen-Cheng Liu
Chih-Yao Ma
Junjiao Tian
Zijian He
Z. Kira
283
64
0
07 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
  Models
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision ModelsInternational Conference on Learning Representations (ICLR), 2022
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViTMoE
325
78
0
04 Oct 2022
Towards Flexible Inductive Bias via Progressive Reparameterization
  Scheduling
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling
Yunsung Lee
Gyuseong Lee
Kwang-seok Ryoo
Hyojun Go
Jihye Park
Seung Wook Kim
139
5
0
04 Oct 2022
Dual-former: Hybrid Self-attention Transformer for Efficient Image
  Restoration
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Erkang Chen
ViT
127
17
0
03 Oct 2022
Attention Distillation: self-supervised vision transformer students need
  more guidance
Attention Distillation: self-supervised vision transformer students need more guidanceBritish Machine Vision Conference (BMVC), 2022
Kai Wang
Fei Yang
Joost van de Weijer
ViT
162
21
0
03 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognitionSpoken Language Technology Workshop (SLT), 2022
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
408
162
0
30 Sep 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and
  Effective Fusion of Local, Global and Input Features
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
317
143
0
30 Sep 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
161
7
0
29 Sep 2022
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully
  Exploiting Self-Attention
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-AttentionInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Xiangcheng Liu
Tianyi Wu
Guodong Guo
ViT
213
45
0
28 Sep 2022
Self-Supervised Masked Convolutional Transformer Block for Anomaly
  Detection
Self-Supervised Masked Convolutional Transformer Block for Anomaly DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Neelu Madan
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Kamal Nasrollahi
Fahad Shahbaz Khan
T. Moeslund
M. Shah
ViTMedIm
555
102
0
25 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question AnsweringIEEE Transactions on Image Processing (IEEE TIP), 2022
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
382
29
0
21 Sep 2022
Understanding the Tricks of Deep Learning in Medical Image Segmentation:
  Challenges and Future Directions
Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions
Dong Zhang
Yi Lin
Hao Chen
Zhuotao Tian
Xin Yang
Jinhui Tang
Kwang-Ting Cheng
VLM
287
19
0
21 Sep 2022
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
Hubert Leterme
K. Polisano
V. Perrier
Alahari Karteek
FAtt
497
2
0
19 Sep 2022
A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning
  Architectures for Mosquito Larvae Classification
A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning Architectures for Mosquito Larvae Classification
Aswin Surya
David B. Peral
Austin VanLoon
A. Rajesh
MedIm
50
5
0
16 Sep 2022
Transformer based Fingerprint Feature Extraction
Transformer based Fingerprint Feature ExtractionInternational Conference on Pattern Recognition (ICPR), 2022
Saraansh Tandon
A. Namboodiri
ViT
210
12
0
08 Sep 2022
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D
  Image Representations
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image RepresentationsInternational Conference on 3D Vision (3DV), 2022
Vadim Tschernezki
Iro Laina
Diane Larlus
Andrea Vedaldi
530
232
0
07 Sep 2022
Fusion of Satellite Images and Weather Data with Transformer Networks
  for Downy Mildew Disease Detection
Fusion of Satellite Images and Weather Data with Transformer Networks for Downy Mildew Disease DetectionIEEE Access (IEEE Access), 2022
William Maillet
Maryam Ouhami
A. Hafiane
ViTMedIm
124
11
0
06 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on
  Computer Vision
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLMSSL
144
2
0
06 Sep 2022
ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative
  Transformer
ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative TransformerACM Multimedia (ACM MM), 2022
Jiaqi Ma
Shengyuan Yan
Guang Dai
Guoli Wang
Qian Zhang
164
10
0
31 Aug 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for
  Visual Recognition
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual RecognitionNeurocomputing (Neurocomputing), 2022
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
162
23
0
31 Aug 2022
MRL: Learning to Mix with Attention and Convolutions
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
244
2
0
30 Aug 2022
Adaptive Perception Transformer for Temporal Action Localization
Adaptive Perception Transformer for Temporal Action Localization
Yizheng Ouyang
Tianjin Zhang
Weibo Gu
Hongfa Wang
240
3
0
25 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted
  Window
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted WindowIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Mocho Go
Hideyuki Tachibana
ViT
171
11
0
24 Aug 2022
Efficient Attention-free Video Shift Transformers
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
218
1
0
23 Aug 2022
FocusFormer: Focusing on What We Need via Architecture Sampler
FocusFormer: Focusing on What We Need via Architecture Sampler
Jing Liu
Jianfei Cai
Bohan Zhuang
162
9
0
23 Aug 2022
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
Jingyu Lin
Jie Jiang
Y. Yan
Chunchao Guo
Hongfa Wang
Wei Liu
Hanzi Wang
ViT
142
3
0
21 Aug 2022
Improved Image Classification with Token Fusion
Improved Image Classification with Token FusionIEEE Access (IEEE Access), 2022
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
183
0
0
19 Aug 2022
Learning Spatial-Frequency Transformer for Visual Object Tracking
Learning Spatial-Frequency Transformer for Visual Object Tracking
Ju Huang
Tianlin Li
Yuanchao Bai
Zhe Wu
Jianlin Zhang
Yongmei Huang
ViT
337
73
0
18 Aug 2022
Conviformers: Convolutionally guided Vision Transformer
Conviformers: Convolutionally guided Vision Transformer
Mohit Vaishnav
Thomas Fel
I. F. Rodriguez
Thomas Serre
ViT
308
2
0
17 Aug 2022
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Xiulong Yang
Sheng-Min Shih
Yinlin Fu
Xiaoting Zhao
Shihao Ji
DiffM
263
62
0
16 Aug 2022
Flow-Guided Transformer for Video Inpainting
Flow-Guided Transformer for Video InpaintingEuropean Conference on Computer Vision (ECCV), 2022
Kaiwen Zhang
Jingjing Fu
Dong Liu
ViT
218
101
0
14 Aug 2022
Class-attention Video Transformer for Engagement Intensity Prediction
Class-attention Video Transformer for Engagement Intensity Prediction
Xusheng Ai
Victor S. Sheng
Chunhua Li
Zhiming Cui
ViT
142
12
0
12 Aug 2022
Deep is a Luxury We Don't Have
Deep is a Luxury We Don't HaveInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViTMedIm
173
3
0
11 Aug 2022
DropKey
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
232
13
0
04 Aug 2022
Maintaining Performance with Less Data
Maintaining Performance with Less Data
Dominic Sanderson
Tatiana Kalgonova
260
1
0
03 Aug 2022
Global-Local Self-Distillation for Visual Representation Learning
Global-Local Self-Distillation for Visual Representation LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tim Lebailly
Tinne Tuytelaars
SSL
138
6
0
29 Jul 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated
  Convolutions
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated ConvolutionsNeural Information Processing Systems (NeurIPS), 2022
Yongming Rao
Wenliang Zhao
Yansong Tang
Jie Zhou
Ser-Nam Lim
Jiwen Lu
ViT
444
338
0
28 Jul 2022
DnSwin: Toward Real-World Denoising via Continuous Wavelet
  Sliding-Transformer
DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-TransformerKnowledge-Based Systems (KBS), 2022
Hao Li
Zhijing Yang
Xiaobin Hong
Ziying Zhao
Junyang Chen
Yukai Shi
Jin-shan Pan
DiffMViT
168
17
0
28 Jul 2022
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
Convolutional Embedding Makes Hierarchical Vision Transformer StrongerEuropean Conference on Computer Vision (ECCV), 2022
Cong Wang
Hongmin Xu
Xiong Zhang
Li Wang
Zhitong Zheng
Haifeng Liu
ViT
111
29
0
27 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-trainingEuropean Conference on Computer Vision (ECCV), 2022
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIPVLM
225
56
0
26 Jul 2022
Self-Distilled Vision Transformer for Domain Generalization
Self-Distilled Vision Transformer for Domain GeneralizationAsian Conference on Computer Vision (ACCV), 2022
M. Sultana
Muzammal Naseer
Muhammad Haris Khan
Salman Khan
Fahad Shahbaz Khan
ViT
300
52
0
25 Jul 2022
Previous
123...101112...161718
Next
Page 11 of 18
Pageof 18