ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (227★)

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 860 papers shown
PVT v2: Improved Baselines with Pyramid Vision Transformer
PVT v2: Improved Baselines with Pyramid Vision TransformerComputational Visual Media (CVM), 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViTAI4TS
791
2,143
0
25 Jun 2021
ViTAS: Vision Transformer Architecture Search
ViTAS: Vision Transformer Architecture SearchEuropean Conference on Computer Vision (ECCV), 2021
Xiu Su
Shan You
Jiyang Xie
Mingkai Zheng
Haiwei Yang
Chao Qian
Changshui Zhang
Xiaogang Wang
Chang Xu
ViT
457
56
0
25 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
424
378
0
24 Jun 2021
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
  Transformers
IA-RED2^22: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Lezhi Li
Zinan Lin
Rogerio Feris
A. Oliva
VLMViT
329
191
0
23 Jun 2021
Vision Permutator: A Permutable MLP-Like Architecture for Visual
  Recognition
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition
Qibin Hou
Zihang Jiang
Li-xin Yuan
Mingg-Ming Cheng
Shuicheng Yan
Jiashi Feng
ViTMLLM
306
236
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
P2T: Pyramid Pooling Transformer for Scene UnderstandingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
609
289
0
22 Jun 2021
Encoder-Decoder Architectures for Clinically Relevant Coronary Artery
  Segmentation
Encoder-Decoder Architectures for Clinically Relevant Coronary Artery SegmentationInternational Conference on Computational Advances in Bio and Medical Sciences (ICCABS), 2021
Joao Lourencco Silva
M. Menezes
T. Rodrigues
B. Silva
F. Pinto
Arlindo L. Oliveira
MedIm
216
22
0
21 Jun 2021
More than Encoder: Introducing Transformer Decoder to Upsample
More than Encoder: Introducing Transformer Decoder to UpsampleIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Yijiang Li
Wentian Cai
Ying Gao
Chengming Li
Xiping Hu
ViTMedIm
254
75
0
20 Jun 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision
  Transformers
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
345
776
0
18 Jun 2021
Efficient Self-supervised Vision Transformers for Representation
  Learning
Efficient Self-supervised Vision Transformers for Representation LearningInternational Conference on Learning Representations (ICLR), 2021
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
303
224
0
17 Jun 2021
S$^2$-MLP: Spatial-Shift MLP Architecture for Vision
S2^22-MLP: Spatial-Shift MLP Architecture for VisionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
261
219
0
14 Jun 2021
Styleformer: Transformer based Generative Adversarial Networks with
  Style Vector
Styleformer: Transformer based Generative Adversarial Networks with Style VectorComputer Vision and Pattern Recognition (CVPR), 2021
Jeeseung Park
Younggeun Kim
ViT
314
59
0
13 Jun 2021
MlTr: Multi-label Classification with Transformer
MlTr: Multi-label Classification with TransformerIEEE International Conference on Multimedia and Expo (ICME), 2021
Xingyi Cheng
Hezheng Lin
Xiangyu Wu
Fan Yang
Dong Shen
Zhongyuan Wang
Nian Shi
Honglin Liu
ViT
176
58
0
11 Jun 2021
Transformed CNNs: recasting pre-trained convolutional layers with
  self-attention
Transformed CNNs: recasting pre-trained convolutional layers with self-attention
Stéphane dÁscoli
Levent Sagun
Giulio Biroli
Ari S. Morcos
ViT
106
7
0
10 Jun 2021
CAT: Cross Attention in Vision Transformer
CAT: Cross Attention in Vision TransformerIEEE International Conference on Multimedia and Expo (ICME), 2021
Hezheng Lin
Xingyi Cheng
Xiangyu Wu
Fan Yang
Dong Shen
Zhongyuan Wang
Qing Song
Wei Yuan
ViT
187
260
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data SizesNeural Information Processing Systems (NeurIPS), 2021
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
578
1,478
0
09 Jun 2021
TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder
  Dilation network for Low-dose CT Denoising
TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder Dilation network for Low-dose CT Denoising
Dayang Wang
Zhan Wu
Hengyong Yu
ViTMedIm
211
66
0
08 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise
  Convolution
On the Connection between Local Attention and Dynamic Depth-wise ConvolutionInternational Conference on Learning Representations (ICLR), 2021
Qi Han
Zejia Fan
Jingdong Sun
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
366
133
0
08 Jun 2021
On Improving Adversarial Transferability of Vision Transformers
On Improving Adversarial Transferability of Vision TransformersInternational Conference on Learning Representations (ICLR), 2021
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Fahad Shahbaz Khan
Fatih Porikli
ViT
262
107
0
08 Jun 2021
Fully Transformer Networks for Semantic Image Segmentation
Fully Transformer Networks for Semantic Image Segmentation
Sitong Wu
Tianyi Wu
Fangjian Lin
Sheng Tian
Guodong Guo
ViT
289
47
0
08 Jun 2021
Efficient Training of Visual Transformers with Small Datasets
Efficient Training of Visual Transformers with Small DatasetsNeural Information Processing Systems (NeurIPS), 2021
Yahui Liu
E. Sangineto
Wei Bi
Andrii Zadaianchuk
Bruno Lepri
Marco De Nadai
ViT
194
215
0
07 Jun 2021
Reveal of Vision Transformers Robustness against Adversarial Attacks
Reveal of Vision Transformers Robustness against Adversarial Attacks
Ahmed Aldahdooh
W. Hamidouche
Olivier Déforges
ViT
247
68
0
07 Jun 2021
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Zilong Huang
Youcheng Ben
Guozhong Luo
Pei Cheng
Gang Yu
Bin-Bin Fu
ViT
276
208
0
07 Jun 2021
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive BiasNeural Information Processing Systems (NeurIPS), 2021
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
ViT
443
396
0
07 Jun 2021
Vision Transformers with Hierarchical Attention
Vision Transformers with Hierarchical AttentionMachine Intelligence Research (MIR), 2021
Yun-Hai Liu
Yu-Huan Wu
Guolei Sun
Le Zhang
Ajad Chhatkuli
Luc Van Gool
ViT
184
72
0
06 Jun 2021
CAPE: Encoding Relative Positions with Continuous Augmented Positional
  Embeddings
CAPE: Encoding Relative Positions with Continuous Augmented Positional EmbeddingsNeural Information Processing Systems (NeurIPS), 2021
Tatiana Likhomanenko
Qiantong Xu
Gabriel Synnaeve
R. Collobert
A. Rogozhnikov
OODViT
343
70
0
06 Jun 2021
Uformer: A General U-Shaped Transformer for Image Restoration
Uformer: A General U-Shaped Transformer for Image RestorationComputer Vision and Pattern Recognition (CVPR), 2021
Zhendong Wang
Xiaodong Cun
Jianmin Bao
Wengang Zhou
Jianzhuang Liu
Houqiang Li
ViT
509
1,912
0
06 Jun 2021
RegionViT: Regional-to-Local Attention for Vision Transformers
RegionViT: Regional-to-Local Attention for Vision TransformersInternational Conference on Learning Representations (ICLR), 2021
Chun-Fu Chen
Yikang Shen
Quanfu Fan
ViT
478
234
0
04 Jun 2021
Glance-and-Gaze Vision Transformer
Glance-and-Gaze Vision TransformerNeural Information Processing Systems (NeurIPS), 2021
Qihang Yu
Yingda Xia
Yutong Bai
Yongyi Lu
Alan Yuille
Wei Shen
ViT
162
83
0
04 Jun 2021
X-volution: On the unification of convolution and self-attention
X-volution: On the unification of convolution and self-attention
Xuanhong Chen
Hang Wang
Bingbing Ni
ViT
154
27
0
04 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of
  the state of the art
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
228
56
0
03 Jun 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with
  Transformers
SegFormer: Simple and Efficient Design for Semantic Segmentation with TransformersNeural Information Processing Systems (NeurIPS), 2021
Enze Xie
Wenhai Wang
Zhiding Yu
Anima Anandkumar
J. Álvarez
Ping Luo
ViT
1.2K
7,116
0
31 May 2021
MSG-Transformer: Exchanging Local Spatial Information by Manipulating
  Messenger Tokens
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger TokensComputer Vision and Pattern Recognition (CVPR), 2021
Jiemin Fang
Lingxi Xie
Xinggang Wang
Xiaopeng Zhang
Wenyu Liu
Qi Tian
ViT
231
84
0
31 May 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Analogous to Evolutionary Algorithm: Designing a Unified Sequence ModelNeural Information Processing Systems (NeurIPS), 2021
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
288
26
0
31 May 2021
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient
  Image Recognition
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image RecognitionNeural Information Processing Systems (NeurIPS), 2021
Yulin Wang
Rui Huang
Qing Xiao
Zeyi Huang
Gao Huang
ViT
283
234
0
31 May 2021
Dual-stream Network for Visual Recognition
Dual-stream Network for Visual RecognitionNeural Information Processing Systems (NeurIPS), 2021
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Shiyang Feng
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
282
78
0
31 May 2021
Less is More: Pay Less Attention in Vision Transformers
Less is More: Pay Less Attention in Vision TransformersAAAI Conference on Artificial Intelligence (AAAI), 2021
Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai
ViT
341
102
0
29 May 2021
KVT: k-NN Attention for Boosting Vision Transformers
KVT: k-NN Attention for Boosting Vision TransformersEuropean Conference on Computer Vision (ECCV), 2021
Pichao Wang
Qingsong Wen
F. Wang
Ming Lin
Shuning Chang
Hao Li
Rong Jin
ViT
260
130
0
28 May 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and
  Interpretable Visual Understanding
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2021
Zizhao Zhang
Han Zhang
Long Zhao
Ting Chen
Sercan O. Arik
Tomas Pfister
ViT
357
207
0
26 May 2021
Pay Attention to MLPs
Pay Attention to MLPsNeural Information Processing Systems (NeurIPS), 2021
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
AI4CE
622
807
0
17 May 2021
Towards Robust Vision Transformer
Towards Robust Vision TransformerComputer Vision and Pattern Recognition (CVPR), 2021
Xiaofeng Mao
Gege Qi
YueFeng Chen
Xiaodan Li
Ranjie Duan
Shaokai Ye
Yuan He
Hui Xue
ViT
466
233
0
17 May 2021
Waste detection in Pomerania: non-profit project for detecting waste in
  environment
Waste detection in Pomerania: non-profit project for detecting waste in environmentWaste Management (Waste Manag.), 2021
Sylwia Majchrowska
Agnieszka Mikołajczyk
M. Ferlin
Zuzanna Klawikowska
Marta A. Plantykow
Arkadiusz Kwasigroch
K. Majek
258
167
0
12 May 2021
Homogeneous vector bundles and $G$-equivariant convolutional neural
  networks
Homogeneous vector bundles and GGG-equivariant convolutional neural networksSampling Theory, Signal Processing, and Data Analysis (SAMPTA), 2021
J. Aronsson
211
27
0
12 May 2021
Do You Even Need Attention? A Stack of Feed-Forward Layers Does
  Surprisingly Well on ImageNet
Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet
Luke Melas-Kyriazi
ViT
122
116
0
06 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for
  Visual Tasks
Beyond Self-attention: External Attention using Two Linear Layers for Visual TasksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
232
640
0
05 May 2021
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Twins: Revisiting the Design of Spatial Attention in Vision TransformersNeural Information Processing Systems (NeurIPS), 2021
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin K. Wei
Huaxia Xia
Chunhua Shen
ViT
665
1,226
0
28 Apr 2021
Vision Transformers with Patch Diversification
Vision Transformers with Patch Diversification
Chengyue Gong
Dilin Wang
Meng Li
Vikas Chandra
Qiang Liu
ViT
257
68
0
26 Apr 2021
Visformer: The Vision-friendly Transformer
Visformer: The Vision-friendly TransformerIEEE International Conference on Computer Vision (ICCV), 2021
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
530
275
0
26 Apr 2021
VidTr: Video Transformer Without Convolutions
VidTr: Video Transformer Without ConvolutionsIEEE International Conference on Computer Vision (ICCV), 2021
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
429
220
0
23 Apr 2021
All Tokens Matter: Token Labeling for Training Better Vision
  Transformers
All Tokens Matter: Token Labeling for Training Better Vision TransformersNeural Information Processing Systems (NeurIPS), 2021
Zihang Jiang
Qibin Hou
Li-xin Yuan
Daquan Zhou
Yujun Shi
Xiaojie Jin
Anran Wang
Jiashi Feng
ViT
403
237
0
22 Apr 2021
Previous
123...161718
Next
Page 17 of 18
Pageof 18