Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (227★)
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 860 papers shown
Brain-inspired Multilayer Perceptron with Spiking Neurons
Computer Vision and Pattern Recognition (CVPR), 2022
Wenshuo Li
Hanting Chen
Jianyuan Guo
Ziyang Zhang
Yunhe Wang
179
40
0
28 Mar 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
Fan Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViT
MedIm
218
35
0
24 Mar 2022
Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Yu Tian
Guansong Pang
Fengbei Liu
Yuyuan Liu
Chong Wang
Yuanhong Chen
Johan Verjans
G. Carneiro
ViT
MedIm
279
35
0
23 Mar 2022
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2022
Ryan Grainger
Thomas Paniagua
Xi Song
Naresh P. Cuntoor
Mun Wai Lee
Tianfu Wu
ViT
167
19
0
22 Mar 2022
Focal Modulation Networks
Neural Information Processing Systems (NeurIPS), 2022
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
352
388
0
22 Mar 2022
MixFormer: End-to-End Tracking with Iterative Mixed Attention
Computer Vision and Pattern Recognition (CVPR), 2022
Yutao Cui
Jiang Cheng
Limin Wang
Gangshan Wu
VOT
324
710
0
21 Mar 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
European Conference on Computer Vision (ECCV), 2022
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
321
64
0
21 Mar 2022
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
IEEE Transactions on Image Processing (IEEE TIP), 2022
Qing Cai
Yiming Qian
Jinxing Li
Junjie Lv
Yee-Hong Yang
Feng Wu
Dafan Zhang
253
49
0
19 Mar 2022
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Mingxin Huang
Yuliang Liu
Zhenghao Peng
Chongyu Liu
Dahua Lin
Shenggao Zhu
N. Yuan
Kai Ding
Lianwen Jin
ViT
212
138
0
19 Mar 2022
CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
Computer Vision and Pattern Recognition (CVPR), 2022
Tianchen Zhao
Niansong Zhang
Xuefei Ning
He Wang
Li Yi
Yu Wang
3DPC
ViT
183
11
0
18 Mar 2022
Three things everyone should know about Vision Transformers
European Conference on Computer Vision (ECCV), 2022
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Jakob Verbeek
Edouard Grave
ViT
249
155
0
18 Mar 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
Interspeech (Interspeech), 2022
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Fahad Shahbaz Khan
ViT
339
39
0
17 Mar 2022
PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation
European Conference on Computer Vision (ECCV), 2022
Zhijie Shen
Chunyu Lin
K. Liao
Lang Nie
Zishuo Zheng
Yao Zhao
ViT
MDE
180
125
0
17 Mar 2022
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Yang He
Weihan Liang
Dongyang Zhao
Hong-Yu Zhou
Weifeng Ge
Yizhou Yu
Wenqiang Zhang
ViT
253
57
0
17 Mar 2022
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?
International Conference on Learning Representations (ICLR), 2022
Y. Fu
Shunyao Zhang
Shan-Hung Wu
Cheng Wan
Yingyan Lin
AAML
419
83
0
16 Mar 2022
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
European Conference on Computer Vision (ECCV), 2022
Hanrong Ye
Dan Xu
ViT
267
114
0
15 Mar 2022
Enriched CNN-Transformer Feature Aggregation Networks for Super-Resolution
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Jinsu Yoo
Taehoon Kim
Sihaeng Lee
Seunghyeon Kim
Hankook Lee
Tae Hyun Kim
SupR
ViT
265
88
0
15 Mar 2022
TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic Segmentation
Journal of Visual Communication and Image Representation (JVCIR), 2022
Ruiwen Li
Zheda Mai
C. Trabelsi
Zhibo Zhang
Jongseong Jang
Scott Sanner
ViT
198
77
0
14 Mar 2022
Deep Transformers Thirst for Comprehensive-Frequency Data
R. Xia
Chao Xue
Boyu Deng
Fang Wang
Jingchao Wang
ViT
277
0
0
14 Mar 2022
Self-Promoted Supervision for Few-Shot Transformer
European Conference on Computer Vision (ECCV), 2022
Bowen Dong
Pan Zhou
Shuicheng Yan
W. Zuo
ViT
184
52
0
14 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Computer Vision and Pattern Recognition (CVPR), 2022
Xiaohan Ding
Xinming Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian Sun
VLM
382
686
0
13 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Computer Vision and Pattern Recognition (CVPR), 2022
Tianlong Chen
Zhenyu Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zinan Lin
ViT
260
49
0
12 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
International Conference on Learning Representations (ICLR), 2022
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zinan Lin
ViT
289
195
0
09 Mar 2022
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer
European Conference on Computer Vision (ECCV), 2022
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
352
75
0
08 Mar 2022
Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Kai Liu
Tianyi Wu
Cong Liu
Guodong Guo
ViT
296
20
0
08 Mar 2022
WaveMix: Resource-efficient Token Mixing for Images
Pranav Jeevan
A. Sethi
111
15
0
07 Mar 2022
Stepwise Feature Fusion: Local Guides Global
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Jinfeng Wang
Qiming Huang
Feilong Tang
Jia Meng
Jionglong Su
Sifan Song
ViT
MedIm
259
247
0
07 Mar 2022
Knowledge Amalgamation for Object Detection with Transformers
IEEE Transactions on Image Processing (IEEE TIP), 2022
Haofei Zhang
Feng Mao
Mengqi Xue
Gongfan Fang
Zunlei Feng
Mingli Song
Weilong Dai
ViT
385
16
0
07 Mar 2022
Multi-Tailed Vision Transformer for Efficient Inference
Neural Networks (NN), 2022
Yunke Wang
Bo Du
Wenyuan Wang
Chang Xu
ViT
594
12
0
03 Mar 2022
ViTransPAD: Video Transformer using convolution and self-attention for Face Presentation Attack Detection
International Conference on Information Photonics (ICIP), 2022
Zuheng Ming
Zitong Yu
M. Al-Ghadi
M. Visani
M. Luqman
J. Burie
ViT
CVBM
153
25
0
03 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions
IEEE International Conference on Consumer Electronics (ICCE), 2022
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
147
2
0
02 Mar 2022
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
Dening Lu
Qian Xie
Linlin Xu
Jonathan Li
3DV
194
93
0
02 Mar 2022
A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark
Yunhe Gao
Mu Zhou
Ding Liu
Zhennan Yan
Shaoting Zhang
Dimitris N. Metaxas
ViT
MedIm
707
97
0
28 Feb 2022
CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising
Physics in Medicine and Biology (PMB), 2022
Dayang Wang
Fenglei Fan
Zhan Wu
R. Liu
Haiwei Yang
Hengyong Yu
ViT
MedIm
181
176
0
28 Feb 2022
Factorizer: A Scalable Interpretable Approach to Context Modeling for Medical Image Segmentation
Pooya Ashtari
Diana Sima
L. De Lathauwer
D. Sappey-Marinier
F. Maes
Sabine Van Huffel
ViT
MedIm
244
47
0
24 Feb 2022
Auto-scaling Vision Transformers without Training
International Conference on Learning Representations (ICLR), 2022
Wuyang Chen
Wei-Ping Huang
Xianzhi Du
Xiaodan Song
Zinan Lin
Denny Zhou
ViT
150
27
0
24 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
Computer Vision and Pattern Recognition (CVPR), 2022
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
762
633
0
22 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
International Journal of Computer Vision (IJCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
287
276
0
21 Feb 2022
Visual Attention Network
Computational Visual Media (CVM), 2022
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViT
VLM
513
887
0
20 Feb 2022
Discriminability-enforcing loss to improve representation learning
Florinel-Alin Croitoru
Diana-Nicoleta Grigore
Radu Tudor Ionescu
FaML
133
1
0
14 Feb 2022
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Seokju Cho
Sunghwan Hong
Seung Wook Kim
ViT
375
57
0
14 Feb 2022
Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs
Huangjie Zheng
Pengcheng He
Weizhu Chen
Mingyuan Zhou
115
16
0
14 Feb 2022
BViT: Broad Attention based Vision Transformer
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Nannan Li
Yaran Chen
Weifan Li
Zixiang Ding
Dong Zhao
ViT
261
30
0
13 Feb 2022
Feature-level augmentation to improve robustness of deep neural networks to affine transformations
A. Sandru
Mariana-Iuliana Georgescu
Radu Tudor Ionescu
OOD
377
7
0
10 Feb 2022
LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Naina Dhingra
131
17
0
07 Feb 2022
Towards an Analytical Definition of Sufficient Data
SN Computer Science (SN Comput. Sci.), 2022
Adam Byerly
T. Kalganova
227
5
0
07 Feb 2022
Training Vision Transformers with Only 2040 Images
European Conference on Computer Vision (ECCV), 2022
Yunhao Cao
Hao Yu
Jianxin Wu
ViT
392
56
0
26 Jan 2022
Convolutional Xformers for Vision
Pranav Jeevan
Amit Sethi
ViT
167
14
0
25 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Kunchang Li
Yali Wang
Junhao Zhang
Shiyang Feng
Guanglu Song
Yu Liu
Jiaming Song
Yu Qiao
ViT
546
532
0
24 Jan 2022
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Aaron Nicolson
Jason Dowling
Bevan Koopman
ViT
LM&MA
MedIm
281
154
0
24 Jan 2022
Previous
1
2
3
...
13
14
15
16
17
18
Next
Page 14 of 18
Page
of 18
Go