Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.11816
Cited By
Incorporating Convolution Designs into Visual Transformers
22 March 2021
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Incorporating Convolution Designs into Visual Transformers"
50 / 218 papers shown
Title
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Yuting Gao
Jinfeng Liu
Zihan Xu
Jinchao Zhang
Ke Li
Rongrong Ji
Chunhua Shen
VLM
CLIP
29
100
0
29 Apr 2022
Adaptive Split-Fusion Transformer
Zixuan Su
Hao Zhang
Jingjing Chen
Lei Pang
Chong-Wah Ngo
Yu-Gang Jiang
ViT
19
7
0
26 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
21
4
0
26 Apr 2022
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
22
36
0
20 Apr 2022
DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks
Ziyang Luo
Yadong Xi
Jing Ma
Zhiwei Yang
Xiaoxi Mao
Changjie Fan
Rongsheng Zhang
14
3
0
19 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
23
123
0
14 Apr 2022
3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume
Jianye Pang
Cheng Jiang
Yihao Chen
Jianbo Chang
M. Feng
Renzhi Wang
Jianhua Yao
ViT
MedIm
28
11
0
14 Apr 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang
Zilong Huang
Guozhong Luo
Tao Chen
Xinggang Wang
Wenyu Liu
Gang Yu
Chunhua Shen
ViT
22
198
0
12 Apr 2022
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
Yanan Zhang
Jiaxin Chen
Di Huang
ViT
3DPC
29
59
0
01 Apr 2022
DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation
Zhenyu Li
Zehui Chen
Xianming Liu
Junjun Jiang
ViT
MDE
36
183
1
27 Mar 2022
Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis
K. Zhang
Yawei Li
Jingyun Liang
Jiezhang Cao
Yulun Zhang
H. Tang
Deng-Ping Fan
Radu Timofte
Luc Van Gool
OOD
30
123
0
24 Mar 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
19
53
0
21 Mar 2022
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
Qing Cai
Yiming Qian
Jinxing Li
Junjie Lv
Yee-Hong Yang
Feng Wu
Dafan Zhang
19
28
0
19 Mar 2022
PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation
Zhijie Shen
Chunyu Lin
K. Liao
Lang Nie
Zishuo Zheng
Yao Zhao
ViT
MDE
27
85
0
17 Mar 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
25
29
0
13 Mar 2022
CF-ViT: A General Coarse-to-Fine Method for Vision Transformer
Mengzhao Chen
Mingbao Lin
Ke Li
Yunhang Shen
Yongjian Wu
Fei Chao
Rongrong Ji
ViT
38
60
0
08 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
16
1
0
02 Mar 2022
Auto-scaling Vision Transformers without Training
Wuyang Chen
Wei Huang
Xianzhi Du
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
27
23
0
24 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
24
229
0
21 Feb 2022
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
32
465
0
14 Feb 2022
Patches Are All You Need?
Asher Trockman
J. Zico Kolter
ViT
225
402
0
24 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
147
361
0
24 Jan 2022
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer
Mengshu Sun
Haoyu Ma
Guoliang Kang
Yifan Jiang
Tianlong Chen
Xiaolong Ma
Zhangyang Wang
Yanzhi Wang
ViT
25
45
0
17 Jan 2022
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
161
156
0
08 Jan 2022
Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
S. Song
Li Erran Li
Gao Huang
ViT
22
456
0
03 Jan 2022
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
Zhenglun Kong
Peiyan Dong
Xiaolong Ma
Xin Meng
Mengshu Sun
...
Geng Yuan
Bin Ren
Minghai Qin
H. Tang
Yanzhi Wang
ViT
28
141
0
27 Dec 2021
ELSA: Enhanced Local Self-Attention for Vision Transformer
Jingkai Zhou
Pichao Wang
Fan Wang
Qiong Liu
Hao Li
Rong Jin
ViT
34
37
0
23 Dec 2021
Towards End-to-End Image Compression and Analysis with Transformers
Yuanchao Bai
Xu Yang
Xianming Liu
Junjun Jiang
Yaowei Wang
Xiangyang Ji
Wen Gao
ViT
29
51
0
17 Dec 2021
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Hai Lan
Xihao Wang
Xian Wei
ViT
26
3
0
10 Dec 2021
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
25
6
0
26 Nov 2021
Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints
Jaesin Ahn
Jiuk Hong
Jeongwoo Ju
Heechul Jung
ViT
24
3
0
19 Nov 2021
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
16
34
0
16 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
71
330
0
11 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
23
20
0
28 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
24
226
0
18 Oct 2021
Investigating Transfer Learning Capabilities of Vision Transformers and CNNs by Fine-Tuning a Single Trainable Block
Durvesh Malpure
Onkar Litake
Rajesh S. Ingle
ViT
19
5
0
11 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Jihao Liu
Hongsheng Li
Guanglu Song
Xin Huang
Yu Liu
ViT
37
35
0
08 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
23
3
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
206
1,212
0
05 Oct 2021
Scaled ReLU Matters for Training Vision Transformers
Pichao Wang
Xue Wang
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
R. L. Jin
13
41
0
08 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
43
105
0
30 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
26
201
0
03 Aug 2021
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
16
953
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
42
428
0
01 Jul 2021
Early Convolutions Help Transformers See Better
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
20
752
0
28 Jun 2021
IA-RED
2
^2
2
: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Rameswar Panda
Yifan Jiang
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
24
219
0
22 Jun 2021
Encoder-Decoder Architectures for Clinically Relevant Coronary Artery Segmentation
Joao Lourencco Silva
M. Menezes
T. Rodrigues
B. Silva
F. Pinto
Arlindo L. Oliveira
MedIm
23
17
0
21 Jun 2021
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
28
497
0
17 Jun 2021
Transformed CNNs: recasting pre-trained convolutional layers with self-attention
Stéphane dÁscoli
Levent Sagun
Giulio Biroli
Ari S. Morcos
ViT
10
6
0
10 Jun 2021
Previous
1
2
3
4
5
Next