ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (227★)

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 860 papers shown
D3Former: Debiased Dual Distilled Transformer for Incremental Learning
D3Former: Debiased Dual Distilled Transformer for Incremental Learning
Abdel-rahman Mohamed
Rushali Grandhe
KJ Joseph
Salman Khan
Fahad Shahbaz Khan
CLL
277
13
0
25 Jul 2022
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Yingyi Chen
Xiaoke Shen
Yahui Liu
Qinghua Tao
Johan A. K. Suykens
AAMLViT
200
37
0
25 Jul 2022
Online Continual Learning with Contrastive Vision Transformer
Online Continual Learning with Contrastive Vision TransformerEuropean Conference on Computer Vision (ECCV), 2022
Zhen Wang
Liu Liu
Yajing Kong
Jiaxian Guo
Dacheng Tao
CLL
186
42
0
24 Jul 2022
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
An Efficient Spatio-Temporal Pyramid Transformer for Action DetectionEuropean Conference on Computer Vision (ECCV), 2022
Yuetian Weng
Zizheng Pan
Mingfei Han
Xiaojun Chang
Bohan Zhuang
ViT
175
31
0
21 Jul 2022
Locality Guidance for Improving Vision Transformers on Tiny Datasets
Locality Guidance for Improving Vision Transformers on Tiny DatasetsEuropean Conference on Computer Vision (ECCV), 2022
Kehan Li
Runyi Yu
Zhennan Wang
Li-ming Yuan
Guoli Song
Jie Chen
ViT
165
59
0
20 Jul 2022
Vision Transformers: From Semantic Segmentation to Dense Prediction
Vision Transformers: From Semantic Segmentation to Dense PredictionInternational Journal of Computer Vision (IJCV), 2022
Li Zhang
Jiachen Lu
Sixiao Zheng
Xinxuan Zhao
Xiatian Zhu
Yanwei Fu
Tao Xiang
Jianfeng Feng
Philip H. S. Torr
ViT
282
17
0
19 Jul 2022
Defect Transformer: An Efficient Hybrid Transformer Architecture for
  Surface Defect Detection
Defect Transformer: An Efficient Hybrid Transformer Architecture for Surface Defect Detection
Junpu Wang
Guili Xu
Fuju Yan
Jinjin Wang
Zhengsheng Wang
ViTMedIm
198
104
0
17 Jul 2022
SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
  Anomaly Detection
SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly DetectionComputer Vision and Image Understanding (CVIU), 2022
Antonio Bărbălău
Radu Tudor Ionescu
Mariana-Iuliana Georgescu
J. Dueholm
B. Ramachandra
Kamal Nasrollahi
Fahad Shahbaz Khan
T. Moeslund
M. Shah
ViT
491
93
0
16 Jul 2022
Convolutional Bypasses Are Better Vision Transformer Adapters
Convolutional Bypasses Are Better Vision Transformer AdaptersEuropean Conference on Artificial Intelligence (ECAI), 2022
Shibo Jie
Zhi-Hong Deng
VPVLM
283
162
0
14 Jul 2022
N-Grammer: Augmenting Transformers with latent n-grams
N-Grammer: Augmenting Transformers with latent n-grams
Aurko Roy
Rohan Anil
Guangda Lai
Benjamin Lee
Jeffrey Zhao
...
Yu
Phuong Dao
Christopher Fifty
Zhiwen Chen
Yonghui Wu
189
9
0
13 Jul 2022
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Eliminating Gradient Conflict in Reference-based Line-Art ColorizationEuropean Conference on Computer Vision (ECCV), 2022
Zekun Li
Zhengyang Geng
Zhao Kang
Wenyu Chen
Jianlong Wu
427
49
0
13 Jul 2022
MSP-Former: Multi-Scale Projection Transformer for Single Image
  Desnowing
MSP-Former: Multi-Scale Projection Transformer for Single Image DesnowingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Taodong Liao
Y. Ye
Erkang Chen
Peng Chen
ViT
243
68
0
12 Jul 2022
Long-term Leap Attention, Short-term Periodic Shift for Video
  Classification
Long-term Leap Attention, Short-term Periodic Shift for Video ClassificationACM Multimedia (ACM MM), 2022
Huatian Zhang
Lechao Cheng
Y. Hao
Chong-Wah Ngo
ViT
200
11
0
12 Jul 2022
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in
  Realistic Industrial Scenarios
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
Jiashi Li
Xin Xia
W. Li
Huixia Li
Xing Wang
Xuefeng Xiao
Rui Wang
Min Zheng
Xin Pan
ViT
243
200
0
12 Jul 2022
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
  Learning
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation LearningEuropean Conference on Computer Vision (ECCV), 2022
Ting Yao
Yingwei Pan
Yehao Li
Chong-Wah Ngo
Tao Mei
ViT
464
196
0
11 Jul 2022
Dual Vision Transformer
Dual Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
365
113
0
11 Jul 2022
Self-attention on Multi-Shifted Windows for Scene Segmentation
Self-attention on Multi-Shifted Windows for Scene Segmentation
Litao Yu
Zhibin Li
Jian Zhang
Qiang Wu
SSeg
172
1
0
10 Jul 2022
Horizontal and Vertical Attention in Transformers
Horizontal and Vertical Attention in Transformers
Litao Yu
Shuai Liu
ViT
148
1
0
10 Jul 2022
QKVA grid: Attention in Image Perspective and Stacked DETR
QKVA grid: Attention in Image Perspective and Stacked DETR
Wenyuan Sheng
ViTMU
33
1
0
09 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech
  Synthesis
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech SynthesisACM Multimedia (ACM MM), 2022
Yongqiang Wang
Zhou Zhao
307
11
0
08 Jul 2022
MaiT: Leverage Attention Masks for More Efficient Image Transformers
MaiT: Leverage Attention Masks for More Efficient Image Transformers
Ling Li
Ali Shafiee Ardestani
Joseph Hassoun
123
1
0
06 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingInternational Conference on Machine Learning (ICML), 2022
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
271
193
0
06 Jul 2022
OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers
OSFormer: One-Stage Camouflaged Instance Segmentation with TransformersEuropean Conference on Computer Vision (ECCV), 2022
Jialun Pei
Tianyang Cheng
Deng-Ping Fan
He Tang
Chuanbo Chen
Luc Van Gool
ViT
305
80
0
05 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical
  Inter-Level Attention
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
195
3
0
05 Jul 2022
Dynamic Spatial Sparsification for Efficient Vision Transformers and
  Convolutional Neural Networks
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yongming Rao
Zuyan Liu
Wenliang Zhao
Jie Zhou
Jiwen Lu
ViT
243
51
0
04 Jul 2022
Masked World Models for Visual Control
Masked World Models for Visual ControlConference on Robot Learning (CoRL), 2022
Younggyo Seo
Danijar Hafner
Hao Liu
Fangchen Liu
Stephen James
Kimin Lee
Pieter Abbeel
OffRL
411
191
0
28 Jun 2022
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Gasser Elbanna
Neil Scheidwasser
M. Kegler
P. Beckmann
Karl El Hajal
Milos Cernak
SSL
362
24
0
24 Jun 2022
Vicinity Vision Transformer
Vicinity Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weixuan Sun
Zhen Qin
Huiyuan Deng
Jianyuan Wang
Yi Zhang
Kaihao Zhang
Nick Barnes
Stan Birchfield
Lingpeng Kong
Yiran Zhong
ViT
225
45
0
21 Jun 2022
Global Context Vision Transformers
Global Context Vision TransformersInternational Conference on Machine Learning (ICML), 2022
Ali Hatamizadeh
Hongxu Yin
Greg Heinrich
Jan Kautz
Pavlo Molchanov
ViT
475
191
0
20 Jun 2022
Learning Multiscale Transformer Models for Sequence Generation
Learning Multiscale Transformer Models for Sequence GenerationInternational Conference on Machine Learning (ICML), 2022
Bei Li
Tong Zheng
Yi Jing
Chengbo Jiao
Tong Xiao
Jingbo Zhu
212
13
0
19 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary AlgorithmInternational Journal of Computer Vision (IJCV), 2022
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Jianlong Wu
Yong Liu
Dacheng Tao
ViT
305
49
0
19 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision TransformersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
306
36
0
17 Jun 2022
Patch-level Representation Learning for Self-supervised Vision
  Transformers
Patch-level Representation Learning for Self-supervised Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
301
77
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
SP-ViT: Learning 2D Spatial Priors for Vision TransformersBritish Machine Vision Conference (BMVC), 2022
Yuxuan Zhou
Wangmeng Xiang
Chong Li
Biao Wang
Xihan Wei
Lei Zhang
Margret Keuper
Xia Hua
ViT
117
19
0
15 Jun 2022
Efficient Adaptive Ensembling for Image Classification
Efficient Adaptive Ensembling for Image Classification
A. Bruno
Davide Moroni
M. Martinelli
195
23
0
15 Jun 2022
Peripheral Vision Transformer
Peripheral Vision TransformerNeural Information Processing Systems (NeurIPS), 2022
Juhong Min
Yucheng Zhao
Chong Luo
Minsu Cho
ViTMDE
244
35
0
14 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
MLP-3D: A MLP-like 3D Architecture with Grouped Time MixingComputer Vision and Pattern Recognition (CVPR), 2022
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
206
18
0
13 Jun 2022
Spatial Entropy as an Inductive Bias for Vision Transformers
Spatial Entropy as an Inductive Bias for Vision TransformersMachine-mediated learning (ML), 2022
E. Peruzzo
E. Sangineto
Yahui Liu
Marco De Nadai
Wei Bi
Bruno Lepri
Andrii Zadaianchuk
ViTMDE
289
7
0
09 Jun 2022
MobileOne: An Improved One millisecond Mobile Backbone
MobileOne: An Improved One millisecond Mobile BackboneComputer Vision and Pattern Recognition (CVPR), 2022
Pavan Kumar Anasosalu Vasu
J. Gabriel
Jeff J. Zhu
Oncel Tuzel
Anurag Ranjan
316
252
0
08 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViTMQ
268
384
0
06 Jun 2022
Federated Adversarial Training with Transformers
Federated Adversarial Training with Transformers
Ahmed Aldahdooh
W. Hamidouche
Olivier Déforges
FedMLViT
230
2
0
05 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
EfficientFormer: Vision Transformers at MobileNet SpeedNeural Information Processing Systems (NeurIPS), 2022
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
734
536
0
02 Jun 2022
Transforming medical imaging with Transformers? A comparative review of
  key properties, current progresses, and future perspectives
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViTOODMedIm
434
150
0
02 Jun 2022
The Fully Convolutional Transformer for Medical Image Segmentation
The Fully Convolutional Transformer for Medical Image SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Athanasios Tragakis
Chaitanya Kaul
Roderick Murray-Smith
D. Husmeier
ViTMedIm
170
100
0
01 Jun 2022
Vision GNN: An Image is Worth Graph of Nodes
Vision GNN: An Image is Worth Graph of NodesNeural Information Processing Systems (NeurIPS), 2022
Kai Han
Yunhe Wang
Jianyuan Guo
Yehui Tang
Enhua Wu
GNN3DH
328
533
0
01 Jun 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Jingdong Sun
Lingxi Xie
Qi Tian
262
40
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence LearningNeural Information Processing Systems (NeurIPS), 2022
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
492
21
0
30 May 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
466
29
0
28 May 2022
Future Transformer for Long-term Action Anticipation
Future Transformer for Long-term Action AnticipationComputer Vision and Pattern Recognition (CVPR), 2022
Dayoung Gong
Joonseok Lee
Manjin Kim
S. Ha
Minsu Cho
AI4TS
128
83
0
27 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Green Hierarchical Vision Transformer for Masked Image ModelingNeural Information Processing Systems (NeurIPS), 2022
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
294
83
0
26 May 2022
Previous
123...111213...161718
Next
Page 12 of 18
Pageof 18