ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (227★)

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 860 papers shown
Brain-inspired Multilayer Perceptron with Spiking Neurons
Brain-inspired Multilayer Perceptron with Spiking NeuronsComputer Vision and Pattern Recognition (CVPR), 2022
Wenshuo Li
Hanting Chen
Jianyuan Guo
Ziyang Zhang
Yunhe Wang
179
40
0
28 Mar 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
Fan Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViTMedIm
218
35
0
24 Mar 2022
Contrastive Transformer-based Multiple Instance Learning for Weakly
  Supervised Polyp Frame Detection
Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame DetectionInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Yu Tian
Guansong Pang
Fengbei Liu
Yuyuan Liu
Chong Wang
Yuanhong Chen
Johan Verjans
G. Carneiro
ViTMedIm
279
35
0
23 Mar 2022
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Ryan Grainger
Thomas Paniagua
Xi Song
Naresh P. Cuntoor
Mun Wai Lee
Tianfu Wu
ViT
167
19
0
22 Mar 2022
Focal Modulation Networks
Focal Modulation NetworksNeural Information Processing Systems (NeurIPS), 2022
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
352
388
0
22 Mar 2022
MixFormer: End-to-End Tracking with Iterative Mixed Attention
MixFormer: End-to-End Tracking with Iterative Mixed AttentionComputer Vision and Pattern Recognition (CVPR), 2022
Yutao Cui
Jiang Cheng
Limin Wang
Gangshan Wu
VOT
324
710
0
21 Mar 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision
  Transformer
ScalableViT: Rethinking the Context-oriented Generalization of Vision TransformerEuropean Conference on Computer Vision (ECCV), 2022
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
321
64
0
21 Mar 2022
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
HIPA: Hierarchical Patch Transformer for Single Image Super ResolutionIEEE Transactions on Image Processing (IEEE TIP), 2022
Qing Cai
Yiming Qian
Jinxing Li
Junjie Lv
Yee-Hong Yang
Feng Wu
Dafan Zhang
253
49
0
19 Mar 2022
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
  Detection and Text Recognition
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text RecognitionComputer Vision and Pattern Recognition (CVPR), 2022
Mingxin Huang
Yuliang Liu
Zhenghao Peng
Chongyu Liu
Dahua Lin
Shenggao Zhu
N. Yuan
Kai Ding
Lianwen Jin
ViT
212
138
0
19 Mar 2022
CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric
  Guidance
CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric GuidanceComputer Vision and Pattern Recognition (CVPR), 2022
Tianchen Zhao
Niansong Zhang
Xuefei Ning
He Wang
Li Yi
Yu Wang
3DPCViT
183
11
0
18 Mar 2022
Three things everyone should know about Vision Transformers
Three things everyone should know about Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Jakob Verbeek
Edouard Grave
ViT
249
155
0
18 Mar 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
SepTr: Separable Transformer for Audio Spectrogram ProcessingInterspeech (Interspeech), 2022
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Fahad Shahbaz Khan
ViT
339
39
0
17 Mar 2022
PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation
PanoFormer: Panorama Transformer for Indoor 360 Depth EstimationEuropean Conference on Computer Vision (ECCV), 2022
Zhijie Shen
Chunyu Lin
K. Liao
Lang Nie
Zishuo Zheng
Yao Zhao
ViTMDE
180
125
0
17 Mar 2022
Attribute Surrogates Learning and Spectral Tokens Pooling in
  Transformers for Few-shot Learning
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot LearningComputer Vision and Pattern Recognition (CVPR), 2022
Yang He
Weihan Liang
Dongyang Zhao
Hong-Yu Zhou
Weifeng Ge
Yizhou Yu
Wenqiang Zhang
ViT
253
57
0
17 Mar 2022
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?International Conference on Learning Representations (ICLR), 2022
Y. Fu
Shunyao Zhang
Shan-Hung Wu
Cheng Wan
Yingyan Lin
AAML
419
83
0
16 Mar 2022
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene
  Understanding
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene UnderstandingEuropean Conference on Computer Vision (ECCV), 2022
Hanrong Ye
Dan Xu
ViT
267
114
0
15 Mar 2022
Enriched CNN-Transformer Feature Aggregation Networks for
  Super-Resolution
Enriched CNN-Transformer Feature Aggregation Networks for Super-ResolutionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Jinsu Yoo
Taehoon Kim
Sihaeng Lee
Seunghyeon Kim
Hankook Lee
Tae Hyun Kim
SupRViT
265
88
0
15 Mar 2022
TransCAM: Transformer Attention-based CAM Refinement for Weakly
  Supervised Semantic Segmentation
TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic SegmentationJournal of Visual Communication and Image Representation (JVCIR), 2022
Ruiwen Li
Zheda Mai
C. Trabelsi
Zhibo Zhang
Jongseong Jang
Scott Sanner
ViT
198
77
0
14 Mar 2022
Deep Transformers Thirst for Comprehensive-Frequency Data
Deep Transformers Thirst for Comprehensive-Frequency Data
R. Xia
Chao Xue
Boyu Deng
Fang Wang
Jingchao Wang
ViT
277
0
0
14 Mar 2022
Self-Promoted Supervision for Few-Shot Transformer
Self-Promoted Supervision for Few-Shot TransformerEuropean Conference on Computer Vision (ECCV), 2022
Bowen Dong
Pan Zhou
Shuicheng Yan
W. Zuo
ViT
184
52
0
14 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNsComputer Vision and Pattern Recognition (CVPR), 2022
Xiaohan Ding
Xinming Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian Sun
VLM
382
686
0
13 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls
  for Reducing All Levels of Redundancy
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of RedundancyComputer Vision and Pattern Recognition (CVPR), 2022
Tianlong Chen
Zhenyu Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zinan Lin
ViT
260
49
0
12 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain
  Analysis: From Theory to Practice
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to PracticeInternational Conference on Learning Representations (ICLR), 2022
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zinan Lin
ViT
289
195
0
09 Mar 2022
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets
  and Transformer
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and TransformerEuropean Conference on Computer Vision (ECCV), 2022
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
352
75
0
08 Mar 2022
Dynamic Group Transformer: A General Vision Transformer Backbone with
  Dynamic Group Attention
Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group AttentionInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Kai Liu
Tianyi Wu
Cong Liu
Guodong Guo
ViT
296
20
0
08 Mar 2022
WaveMix: Resource-efficient Token Mixing for Images
WaveMix: Resource-efficient Token Mixing for Images
Pranav Jeevan
A. Sethi
111
15
0
07 Mar 2022
Stepwise Feature Fusion: Local Guides Global
Stepwise Feature Fusion: Local Guides GlobalInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Jinfeng Wang
Qiming Huang
Feilong Tang
Jia Meng
Jionglong Su
Sifan Song
ViTMedIm
259
247
0
07 Mar 2022
Knowledge Amalgamation for Object Detection with Transformers
Knowledge Amalgamation for Object Detection with TransformersIEEE Transactions on Image Processing (IEEE TIP), 2022
Haofei Zhang
Feng Mao
Mengqi Xue
Gongfan Fang
Zunlei Feng
Mingli Song
Weilong Dai
ViT
385
16
0
07 Mar 2022
Multi-Tailed Vision Transformer for Efficient Inference
Multi-Tailed Vision Transformer for Efficient InferenceNeural Networks (NN), 2022
Yunke Wang
Bo Du
Wenyuan Wang
Chang Xu
ViT
594
12
0
03 Mar 2022
ViTransPAD: Video Transformer using convolution and self-attention for
  Face Presentation Attack Detection
ViTransPAD: Video Transformer using convolution and self-attention for Face Presentation Attack DetectionInternational Conference on Information Photonics (ICIP), 2022
Zuheng Ming
Zitong Yu
M. Al-Ghadi
M. Visani
M. Luqman
J. Burie
ViTCVBM
153
25
0
03 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
  for Image Recognition without Convolutions
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without ConvolutionsIEEE International Conference on Consumer Electronics (ICCE), 2022
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
147
2
0
02 Mar 2022
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
Dening Lu
Qian Xie
Linlin Xu
Jonathan Li
3DV
194
93
0
02 Mar 2022
A Data-scalable Transformer for Medical Image Segmentation:
  Architecture, Model Efficiency, and Benchmark
A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark
Yunhe Gao
Mu Zhou
Ding Liu
Zhennan Yan
Shaoting Zhang
Dimitris N. Metaxas
ViTMedIm
707
97
0
28 Feb 2022
CTformer: Convolution-free Token2Token Dilated Vision Transformer for
  Low-dose CT Denoising
CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT DenoisingPhysics in Medicine and Biology (PMB), 2022
Dayang Wang
Fenglei Fan
Zhan Wu
R. Liu
Haiwei Yang
Hengyong Yu
ViTMedIm
181
176
0
28 Feb 2022
Factorizer: A Scalable Interpretable Approach to Context Modeling for
  Medical Image Segmentation
Factorizer: A Scalable Interpretable Approach to Context Modeling for Medical Image Segmentation
Pooya Ashtari
Diana Sima
L. De Lathauwer
D. Sappey-Marinier
F. Maes
Sabine Van Huffel
ViTMedIm
244
47
0
24 Feb 2022
Auto-scaling Vision Transformers without Training
Auto-scaling Vision Transformers without TrainingInternational Conference on Learning Representations (ICLR), 2022
Wuyang Chen
Wei-Ping Huang
Xianzhi Du
Xiaodan Song
Zinan Lin
Denny Zhou
ViT
150
27
0
24 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text SupervisionComputer Vision and Pattern Recognition (CVPR), 2022
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViTVLM
762
633
0
22 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and BeyondInternational Journal of Computer Vision (IJCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
287
276
0
21 Feb 2022
Visual Attention Network
Visual Attention NetworkComputational Visual Media (CVM), 2022
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViTVLM
513
887
0
20 Feb 2022
Discriminability-enforcing loss to improve representation learning
Discriminability-enforcing loss to improve representation learning
Florinel-Alin Croitoru
Diana-Nicoleta Grigore
Radu Tudor Ionescu
FaML
133
1
0
14 Feb 2022
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
CATs++: Boosting Cost Aggregation with Convolutions and TransformersIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Seokju Cho
Sunghwan Hong
Seung Wook Kim
ViT
375
57
0
14 Feb 2022
Mixing and Shifting: Exploiting Global and Local Dependencies in Vision
  MLPs
Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs
Huangjie Zheng
Pengcheng He
Weizhu Chen
Mingyuan Zhou
115
16
0
14 Feb 2022
BViT: Broad Attention based Vision Transformer
BViT: Broad Attention based Vision TransformerIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Nannan Li
Yaran Chen
Weifan Li
Zixiang Ding
Dong Zhao
ViT
261
30
0
13 Feb 2022
Feature-level augmentation to improve robustness of deep neural networks
  to affine transformations
Feature-level augmentation to improve robustness of deep neural networks to affine transformations
A. Sandru
Mariana-Iuliana Georgescu
Radu Tudor Ionescu
OOD
377
7
0
10 Feb 2022
LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation
LwPosr: Lightweight Efficient Fine-Grained Head Pose EstimationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Naina Dhingra
131
17
0
07 Feb 2022
Towards an Analytical Definition of Sufficient Data
Towards an Analytical Definition of Sufficient DataSN Computer Science (SN Comput. Sci.), 2022
Adam Byerly
T. Kalganova
227
5
0
07 Feb 2022
Training Vision Transformers with Only 2040 Images
Training Vision Transformers with Only 2040 ImagesEuropean Conference on Computer Vision (ECCV), 2022
Yunhao Cao
Hao Yu
Jianxin Wu
ViT
392
56
0
26 Jan 2022
Convolutional Xformers for Vision
Convolutional Xformers for Vision
Pranav Jeevan
Amit Sethi
ViT
167
14
0
25 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Kunchang Li
Yali Wang
Junhao Zhang
Shiyang Feng
Guanglu Song
Yu Liu
Jiaming Song
Yu Qiao
ViT
546
532
0
24 Jan 2022
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Aaron Nicolson
Jason Dowling
Bevan Koopman
ViTLM&MAMedIm
281
154
0
24 Jan 2022
Previous
123...131415161718
Next
Page 14 of 18
Pageof 18