Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (227★)
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 860 papers shown
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
322
303
0
18 Oct 2021
CAE-Transformer: Transformer-based Model to Predict Invasiveness of Lung Adenocarcinoma Subsolid Nodules from Non-thin Section 3D CT Scans
Shahin Heidarian
Parnian Afshar
A. Oikonomou
Konstantinos N. Plataniotis
Arash Mohammadi
ViT
MedIm
226
3
0
17 Oct 2021
CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for Non-Contrast to Contrast CT Translation
Nicolae-Cătălin Ristea
A. Miron
O. Savencu
Mariana-Iuliana Georgescu
N. Verga
Fahad Shahbaz Khan
Radu Tudor Ionescu
ViT
MedIm
500
30
0
12 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Computer Vision and Pattern Recognition (CVPR), 2021
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
213
80
0
10 Oct 2021
Adversarial Token Attacks on Vision Transformers
Ameya Joshi
Gauri Jagatap
Chinmay Hegde
ViT
193
23
0
08 Oct 2021
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Eleonora Grassucci
Aston Zhang
Danilo Comminiello
216
45
0
08 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
European Conference on Computer Vision (ECCV), 2021
Jihao Liu
Jiaming Song
Guanglu Song
Xin Huang
Yu Liu
ViT
233
38
0
08 Oct 2021
SERAB: A multi-lingual benchmark for speech emotion recognition
Neil Scheidwasser
M. Kegler
P. Beckmann
Milos Cernak
204
52
0
07 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
259
89
0
06 Oct 2021
3rd Place Solution to Google Landmark Recognition Competition 2021
Chengfeng Xu
Weimin Wang
Shuai Liu
Yong Wang
Yuxiang Tang
Tianling Bian
Yanyu Yan
Qi She
Cheng Yang
3DPC
3DV
216
6
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
253
4
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
649
1,915
0
05 Oct 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
320
27
0
29 Sep 2021
Fine-tuning Vision Transformers for the Prediction of State Variables in Ising Models
Onur Kara
Arijit Sehanobish
H. Corzo
154
4
0
28 Sep 2021
BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor Segmentation
Qiran Jia
Hai Shu
ViT
MedIm
236
101
0
25 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
186
8
0
21 Sep 2021
SDTP: Semantic-aware Decoupled Transformer Pyramid for Dense Image Prediction
Zekun Li
Yufan Liu
Bing Li
Weiming Hu
Kebin Wu
Chengwei Peng
ViT
144
24
0
18 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
410
187
0
17 Sep 2021
Complementary Feature Enhanced Network with Vision Transformer for Image Dehazing
Dong Zhao
Jia Li
Hongyu Li
Longhao Xu
ViT
229
23
0
15 Sep 2021
LibFewShot: A Comprehensive Library for Few-shot Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Wenbin Li
Ziyi
Ziyi Wang
Xuesong Yang
C. Dong
...
Jing Huo
Yinghuan Shi
Lei Wang
Yang Gao
Jiebo Luo
VLM
413
83
0
10 Sep 2021
Towards Transferable Adversarial Attacks on Vision Transformers
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zhipeng Wei
Yue Yu
Micah Goldblum
Zuxuan Wu
Tom Goldstein
Yu-Gang Jiang
ViT
AAML
345
141
0
09 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
AAAI Conference on Artificial Intelligence (AAAI), 2021
Pichao Wang
Qingsong Wen
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
Rong Jin
244
52
0
08 Sep 2021
Searching for Efficient Multi-Stage Vision Transformers
Yi-Lun Liao
S. Karaman
Vivienne Sze
ViT
113
19
0
01 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Computer Vision and Pattern Recognition (CVPR), 2021
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
281
115
0
30 Aug 2021
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP
Yucheng Zhao
Guangting Wang
Chuanxin Tang
Chong Luo
Wenjun Zeng
Zhengjun Zha
175
91
0
30 Aug 2021
Reiterative Domain Aware Multi-Target Adaptation
German Conference on Pattern Recognition (DAGM), 2021
Sudipan Saha
Shan Zhao
Nasrullah Sheikh
Xiao Xiang Zhu
169
3
0
26 Aug 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Neural Information Processing Systems (NeurIPS), 2021
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
AI4TS
ViT
308
30
0
26 Aug 2021
Transformers predicting the future. Applying attention in next-frame and time series forecasting
Radostin Cholakov
T. Kolev
AI4TS
159
20
0
18 Aug 2021
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
B. Dong
Wenhai Wang
Deng-Ping Fan
Jinpeng Li
Huazhu Fu
Ling Shao
ViT
MedIm
766
454
0
16 Aug 2021
Mobile-Former: Bridging MobileNet and Transformer
Computer Vision and Pattern Recognition (CVPR), 2021
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Xiyang Dai
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
863
623
0
12 Aug 2021
ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Zijian Zhang
Chang Shu
Youxin Chen
Jing Xiao
Qian Zhang
Lu Zheng
176
6
0
11 Aug 2021
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
ACM Multimedia (ACM MM), 2021
Zhengyi Liu
Yuan Wang
Zhengzheng Tu
Yun Xiao
Bin Tang
ViT
310
167
0
09 Aug 2021
Armour: Generalizable Compact Self-Attention for Vision Transformers
Lingchuan Meng
ViT
62
3
0
03 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
374
263
0
03 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
International Conference on Learning Representations (ICLR), 2021
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
506
343
0
31 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
200
238
0
22 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
International Conference on Learning Representations (ICLR), 2021
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
395
253
0
21 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
457
88
0
13 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Juil Sock
234
28
0
13 Jul 2021
Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms
Chenglin Yang
Siyuan Qiao
Adam Kortylewski
Alan Yuille
261
4
0
12 Jul 2021
Local-to-Global Self-Attention in Vision Transformers
Jinpeng Li
Manwen Liao
Tianran Ouyang
Xiaokang Yang
Ling Shao
ViT
121
35
0
10 Jul 2021
ViTGAN: Training GANs with Vision Transformers
International Conference on Learning Representations (ICLR), 2021
Kwonjoon Lee
Huiwen Chang
Lu Jiang
Han Zhang
Zhuowen Tu
Ce Liu
ViT
351
220
0
09 Jul 2021
Vision Xformers: Efficient Attention for Image Classification
Pranav Jeevan
Amit Sethi
ViT
178
14
0
05 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
442
161
0
05 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
795
1,236
0
01 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
300
322
0
01 Jul 2021
Global Filter Networks for Image Classification
Yongming Rao
Wenliang Zhao
Zheng Zhu
Jiwen Lu
Jie Zhou
ViT
303
611
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
353
501
0
01 Jul 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
British Machine Vision Conference (BMVC), 2021
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
197
27
0
28 Jun 2021
Early Convolutions Help Transformers See Better
Neural Information Processing Systems (NeurIPS), 2021
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
377
887
0
28 Jun 2021
Previous
1
2
3
...
15
16
17
18
Next
Page 16 of 18
Page
of 18
Go