Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (227★)
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 857 papers shown
Title
Global Vision Transformer Pruning with Hessian-Aware Saliency
Computer Vision and Pattern Recognition (CVPR), 2021
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
138
72
0
10 Oct 2021
Adversarial Token Attacks on Vision Transformers
Ameya Joshi
Gauri Jagatap
Chinmay Hegde
ViT
149
22
0
08 Oct 2021
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Eleonora Grassucci
Aston Zhang
Danilo Comminiello
164
43
0
08 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
European Conference on Computer Vision (ECCV), 2021
Jihao Liu
Jiaming Song
Guanglu Song
Xin Huang
Yu Liu
ViT
199
38
0
08 Oct 2021
SERAB: A multi-lingual benchmark for speech emotion recognition
Neil Scheidwasser
M. Kegler
P. Beckmann
Milos Cernak
175
49
0
07 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
210
89
0
06 Oct 2021
3rd Place Solution to Google Landmark Recognition Competition 2021
Chengfeng Xu
Weimin Wang
Shuai Liu
Yong Wang
Yuxiang Tang
Tianling Bian
Yanyu Yan
Qi She
Cheng Yang
3DPC
3DV
175
6
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
176
4
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
504
1,782
0
05 Oct 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
255
25
0
29 Sep 2021
Fine-tuning Vision Transformers for the Prediction of State Variables in Ising Models
Onur Kara
Arijit Sehanobish
H. Corzo
117
4
0
28 Sep 2021
BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor Segmentation
Qiran Jia
Hai Shu
ViT
MedIm
185
91
0
25 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
148
8
0
21 Sep 2021
SDTP: Semantic-aware Decoupled Transformer Pyramid for Dense Image Prediction
Zekun Li
Yufan Liu
Bing Li
Weiming Hu
Kebin Wu
Chengwei Peng
ViT
108
24
0
18 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
371
177
0
17 Sep 2021
Complementary Feature Enhanced Network with Vision Transformer for Image Dehazing
Dong Zhao
Jia Li
Hongyu Li
Longhao Xu
ViT
168
21
0
15 Sep 2021
LibFewShot: A Comprehensive Library for Few-shot Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Wenbin Li
Ziyi
Ziyi Wang
Xuesong Yang
C. Dong
...
Jing Huo
Yinghuan Shi
Lei Wang
Yang Gao
Jiebo Luo
VLM
320
81
0
10 Sep 2021
Towards Transferable Adversarial Attacks on Vision Transformers
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zhipeng Wei
Yue Yu
Micah Goldblum
Zuxuan Wu
Tom Goldstein
Yu-Gang Jiang
ViT
AAML
260
139
0
09 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
AAAI Conference on Artificial Intelligence (AAAI), 2021
Pichao Wang
Qingsong Wen
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
Rong Jin
200
50
0
08 Sep 2021
Searching for Efficient Multi-Stage Vision Transformers
Yi-Lun Liao
S. Karaman
Vivienne Sze
ViT
93
19
0
01 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Computer Vision and Pattern Recognition (CVPR), 2021
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
235
114
0
30 Aug 2021
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP
Yucheng Zhao
Guangting Wang
Chuanxin Tang
Chong Luo
Wenjun Zeng
Zhengjun Zha
158
88
0
30 Aug 2021
Reiterative Domain Aware Multi-Target Adaptation
German Conference on Pattern Recognition (DAGM), 2021
Sudipan Saha
Shan Zhao
Nasrullah Sheikh
Xiao Xiang Zhu
166
2
0
26 Aug 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Neural Information Processing Systems (NeurIPS), 2021
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
AI4TS
ViT
236
29
0
26 Aug 2021
Transformers predicting the future. Applying attention in next-frame and time series forecasting
Radostin Cholakov
T. Kolev
AI4TS
112
20
0
18 Aug 2021
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
B. Dong
Wenhai Wang
Deng-Ping Fan
Jinpeng Li
Huazhu Fu
Ling Shao
ViT
MedIm
509
436
0
16 Aug 2021
Mobile-Former: Bridging MobileNet and Transformer
Computer Vision and Pattern Recognition (CVPR), 2021
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Xiyang Dai
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
734
598
0
12 Aug 2021
ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Zijian Zhang
Chang Shu
Youxin Chen
Jing Xiao
Qian Zhang
Lu Zheng
124
6
0
11 Aug 2021
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
ACM Multimedia (ACM MM), 2021
Zhengyi Liu
Yuan Wang
Zhengzheng Tu
Yun Xiao
Bin Tang
ViT
279
163
0
09 Aug 2021
Armour: Generalizable Compact Self-Attention for Vision Transformers
Lingchuan Meng
ViT
57
3
0
03 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
300
257
0
03 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
International Conference on Learning Representations (ICLR), 2021
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
429
327
0
31 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
165
229
0
22 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
International Conference on Learning Representations (ICLR), 2021
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
331
250
0
21 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
326
82
0
13 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Juil Sock
207
28
0
13 Jul 2021
Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms
Chenglin Yang
Siyuan Qiao
Adam Kortylewski
Alan Yuille
220
4
0
12 Jul 2021
Local-to-Global Self-Attention in Vision Transformers
Jinpeng Li
Manwen Liao
Tianran Ouyang
Xiaokang Yang
Ling Shao
ViT
105
33
0
10 Jul 2021
ViTGAN: Training GANs with Vision Transformers
International Conference on Learning Representations (ICLR), 2021
Kwonjoon Lee
Huiwen Chang
Lu Jiang
Han Zhang
Zhuowen Tu
Ce Liu
ViT
239
217
0
09 Jul 2021
Vision Xformers: Efficient Attention for Image Classification
Pranav Jeevan
Amit Sethi
ViT
130
14
0
05 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
350
156
0
05 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
662
1,201
0
01 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
237
315
0
01 Jul 2021
Global Filter Networks for Image Classification
Yongming Rao
Wenliang Zhao
Zheng Zhu
Jiwen Lu
Jie Zhou
ViT
248
575
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
242
487
0
01 Jul 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
British Machine Vision Conference (BMVC), 2021
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
158
27
0
28 Jun 2021
Early Convolutions Help Transformers See Better
Neural Information Processing Systems (NeurIPS), 2021
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
306
870
0
28 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Computational Visual Media (CVM), 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
522
2,033
0
25 Jun 2021
ViTAS: Vision Transformer Architecture Search
European Conference on Computer Vision (ECCV), 2021
Xiu Su
Shan You
Jiyang Xie
Mingkai Zheng
Haiwei Yang
Chao Qian
Changshui Zhang
Xiaogang Wang
Chang Xu
ViT
388
55
0
25 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
335
367
0
24 Jun 2021
Previous
1
2
3
...
15
16
17
18
Next