ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXivPDFHTML

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 287 papers shown
Title
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
142
361
0
24 Jan 2022
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Aaron Nicolson
Jason Dowling
Bevan Koopman
ViT
LM&MA
MedIm
17
90
0
24 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
31
238
0
12 Jan 2022
A ConvNet for the 2020s
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
40
4,967
0
10 Jan 2022
QuadTree Attention for Vision Transformers
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
157
156
0
08 Jan 2022
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid
  Architecture
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture
Kai Han
Jianyuan Guo
Yehui Tang
Yunhe Wang
ViT
26
22
0
04 Jan 2022
HPRN: Holistic Prior-embedded Relation Network for Spectral
  Super-Resolution
HPRN: Holistic Prior-embedded Relation Network for Spectral Super-Resolution
Chaoxiong Wu
Jiaojiao Li
Rui Song
Yunsong Li
Qian Du
22
15
0
29 Dec 2021
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped
  Attention
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
23
70
0
28 Dec 2021
Vision Transformer for Small-Size Datasets
Vision Transformer for Small-Size Datasets
Seung Hoon Lee
Seunghyun Lee
B. Song
ViT
8
222
0
27 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
29
244
0
21 Dec 2021
Towards End-to-End Image Compression and Analysis with Transformers
Towards End-to-End Image Compression and Analysis with Transformers
Yuanchao Bai
Xu Yang
Xianming Liu
Junjun Jiang
Yaowei Wang
Xiangyang Ji
Wen Gao
ViT
29
51
0
17 Dec 2021
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Hai Lan
Xihao Wang
Xian Wei
ViT
26
3
0
10 Dec 2021
3D Medical Point Transformer: Introducing Convolution to Attention
  Networks for Medical Point Cloud Analysis
3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis
Jianhui Yu
Chaoyi Zhang
Heng Wang
Dingxin Zhang
Yang Song
Tiange Xiang
Dongnan Liu
Weidong (Tom) Cai
ViT
MedIm
19
32
0
09 Dec 2021
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
F. Brémond
ViT
36
73
0
07 Dec 2021
Creating Multimodal Interactive Agents with Imitation and
  Self-Supervised Learning
Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning
DeepMind Interactive Agents Team Josh Abramson
Josh Abramson
Arun Ahuja
Arthur Brussee
Federico Carnevale
...
Tamara von Glehn
Greg Wayne
Nathaniel Wong
Chen Yan
Rui Zhu
LM&Ro
32
46
0
07 Dec 2021
BEVT: BERT Pretraining of Video Transformers
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
27
203
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
46
677
0
02 Dec 2021
Vision Pair Learning: An Efficient Training Framework for Image
  Classification
Vision Pair Learning: An Efficient Training Framework for Image Classification
Bei Tong
Xiaoyuan Yu
ViT
17
0
0
02 Dec 2021
TransWeather: Transformer-based Restoration of Images Degraded by
  Adverse Weather Conditions
TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions
Jeya Maria Jose Valanarasu
R. Yasarla
Vishal M. Patel
ViT
39
275
0
29 Nov 2021
On the Integration of Self-Attention and Convolution
On the Integration of Self-Attention and Convolution
Xuran Pan
Chunjiang Ge
Rui Lu
S. Song
Guanfu Chen
Zeyi Huang
Gao Huang
SSL
36
287
0
29 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
23
6
0
26 Nov 2021
Self-slimmed Vision Transformer
Self-slimmed Vision Transformer
Zhuofan Zong
Kunchang Li
Guanglu Song
Yali Wang
Yu Qiao
B. Leng
Yu Liu
ViT
19
30
0
24 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
24
878
0
22 Nov 2021
Searching for TrioNet: Combining Convolution with Local and Global
  Self-Attention
Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi
Huiyu Wang
Yingwei Li
Zizhang Li
Alan Yuille
ViT
19
3
0
15 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
69
330
0
11 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Blending Anti-Aliasing into Vision Transformer
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
21
20
0
28 Oct 2021
MVT: Multi-view Vision Transformer for 3D Object Recognition
MVT: Multi-view Vision Transformer for 3D Object Recognition
Shuo Chen
Tan Yu
Ping Li
ViT
32
43
0
25 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
24
226
0
18 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Global Vision Transformer Pruning with Hessian-Aware Saliency
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
30
38
0
10 Oct 2021
Adversarial Token Attacks on Vision Transformers
Adversarial Token Attacks on Vision Transformers
Ameya Joshi
Gauri Jagatap
C. Hegde
ViT
30
19
0
08 Oct 2021
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex
  Convolutions
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
Eleonora Grassucci
Aston Zhang
Danilo Comminiello
20
38
0
08 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and
  MLP
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Jihao Liu
Hongsheng Li
Guanglu Song
Xin Huang
Yu Liu
ViT
29
35
0
08 Oct 2021
SERAB: A multi-lingual benchmark for speech emotion recognition
SERAB: A multi-lingual benchmark for speech emotion recognition
Neil Scheidwasser
M. Kegler
P. Beckmann
Milos Cernak
24
44
0
07 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to
  CNNs
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
29
78
0
06 Oct 2021
3rd Place Solution to Google Landmark Recognition Competition 2021
3rd Place Solution to Google Landmark Recognition Competition 2021
Chengfeng Xu
Weimin Wang
Shuai Liu
Yong Wang
Yuxiang Tang
Tianling Bian
Yanyu Yan
Qi She
Cheng Yang
3DPC
3DV
25
6
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
21
3
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
189
1,212
0
05 Oct 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
106
20
0
29 Sep 2021
BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor
  Segmentation
BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor Segmentation
Qiran Jia
Hai Shu
ViT
MedIm
90
69
0
25 Sep 2021
LibFewShot: A Comprehensive Library for Few-shot Learning
LibFewShot: A Comprehensive Library for Few-shot Learning
Wenbin Li
Ziyi
Ziyi Wang
Xuesong Yang
C. Dong
...
Jing Huo
Yinghuan Shi
Lei Wang
Yang Gao
Jiebo Luo
VLM
108
66
0
10 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
Scaled ReLU Matters for Training Vision Transformers
Pichao Wang
Xue Wang
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
R. L. Jin
13
41
0
08 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
38
105
0
30 Aug 2021
A Battle of Network Structures: An Empirical Study of CNN, Transformer,
  and MLP
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP
Yucheng Zhao
Guangting Wang
Chuanxin Tang
Chong Luo
Wenjun Zeng
Zhengjun Zha
26
69
0
30 Aug 2021
Mobile-Former: Bridging MobileNet and Transformer
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
172
476
0
12 Aug 2021
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer
  Embedding Network
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
Zhengyi Liu
Yuan Wang
Zhengzheng Tu
Yun Xiao
Bin Tang
ViT
22
142
0
09 Aug 2021
Armour: Generalizable Compact Self-Attention for Vision Transformers
Armour: Generalizable Compact Self-Attention for Vision Transformers
Lingchuan Meng
ViT
19
3
0
03 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
24
201
0
03 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale
  Attention
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
32
256
0
31 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
8
187
0
22 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
19
231
0
21 Jul 2021
Previous
123456
Next