Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15358
Cited By
v1
v2 (latest)
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (246★)
Papers citing
"Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding"
47 / 197 papers shown
Title
Single UHD Image Dehazing via Interpretable Pyramid Network
Social Science Research Network (SSRN), 2022
Boxue Xiao
Zhuoran Zheng
Xiang Chen
Chengfeng Lv
Yunliang Zhuang
Tao Wang
116
33
0
17 Feb 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Kunchang Li
Yali Wang
Junhao Zhang
Shiyang Feng
Guanglu Song
Yu Liu
Jiaming Song
Yu Qiao
ViT
439
512
0
24 Jan 2022
ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer
IEEE Transactions on Medical Imaging (IEEE TMI), 2022
Pengfei Guo
Yiqun Mei
Jinyuan Zhou
Shanshan Jiang
Vishal M. Patel
ViT
MedIm
218
97
0
23 Jan 2022
Pyramid Fusion Transformer for Semantic Segmentation
IEEE transactions on multimedia (IEEE TMM), 2022
Zipeng Qin
Jianbo Liu
Xiaoling Zhang
Maoqing Tian
Aojun Zhou
Shuai Yi
Jiaming Song
ViT
403
27
0
11 Jan 2022
Vision Transformer with Deformable Attention
Computer Vision and Pattern Recognition (CVPR), 2022
Zhuofan Xia
Xuran Pan
Qing Xiao
Li Erran Li
Gao Huang
ViT
390
668
0
03 Jan 2022
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
AAAI Conference on Artificial Intelligence (AAAI), 2021
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
226
83
0
28 Dec 2021
Augmenting Convolutional networks with attention-based aggregation
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Piotr Bojanowski
Armand Joulin
Gabriel Synnaeve
Edouard Grave
ViT
190
59
0
27 Dec 2021
ELSA: Enhanced Local Self-Attention for Vision Transformer
Jingkai Zhou
Pichao Wang
Fan Wang
Qiong Liu
Hao Li
Rong Jin
ViT
211
44
0
23 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
Computer Vision and Pattern Recognition (CVPR), 2021
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
270
315
0
21 Dec 2021
On Efficient Transformer-Based Image Pre-training for Low-Level Vision
International Joint Conference on Artificial Intelligence (IJCAI), 2021
Wenbo Li
Xin Lu
Shengju Qian
Jiangbo Lu
Xinming Zhang
Jiaya Jia
ViT
218
122
0
19 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
220
120
0
09 Dec 2021
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
425
1,359
0
07 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
435
830
0
02 Dec 2021
Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren
Daquan Zhou
Shengfeng He
Jiashi Feng
Xinchao Wang
ViT
269
277
0
30 Nov 2021
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
Hao Liu
Xinghua Jiang
Xin Li
Zhimin Bao
Deqiang Jiang
Bo Ren
ViT
164
17
0
25 Nov 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
377
1,041
0
22 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
477
2,356
0
18 Nov 2021
A Survey of Visual Transformers
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Peng Wang
Jianping Fan
Zhiqiang He
3DGS
ViT
390
459
0
11 Nov 2021
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2021
Jiaqi Gu
Hyoukjun Kwon
Dilin Wang
Wei Ye
Meng Li
Yu-Hsin Chen
Liangzhen Lai
Vikas Chandra
David Z. Pan
ViT
213
216
0
01 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Neural Information Processing Systems (NeurIPS), 2021
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
183
23
0
28 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
277
296
0
18 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
544
1,827
0
05 Oct 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
291
26
0
29 Sep 2021
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
Tongkun Xu
Weihua Chen
Pichao Wang
Fan Wang
Hao Li
Rong Jin
ViT
518
271
0
13 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
AAAI Conference on Artificial Intelligence (AAAI), 2021
Pichao Wang
Qingsong Wen
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
Rong Jin
216
51
0
08 Sep 2021
Searching for Efficient Multi-Stage Vision Transformers
Yi-Lun Liao
S. Karaman
Vivienne Sze
ViT
97
19
0
01 Sep 2021
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
ACM Multimedia (ACM MM), 2021
Zhengyi Liu
Yuan Wang
Zhengzheng Tu
Yun Xiao
Bin Tang
ViT
283
164
0
09 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
International Conference on Learning Representations (ICLR), 2021
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
457
331
0
31 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
374
87
0
13 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
366
157
0
05 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
702
1,215
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
258
493
0
01 Jul 2021
Can An Image Classifier Suffice For Action Recognition?
International Conference on Learning Representations (ICLR), 2021
Quanfu Fan
Chun-Fu Chen
Chen
Yikang Shen
ViT
244
36
0
26 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
518
283
0
22 Jun 2021
Efficient Self-supervised Vision Transformers for Representation Learning
International Conference on Learning Representations (ICLR), 2021
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
247
221
0
17 Jun 2021
XCiT: Cross-Covariance Image Transformers
Neural Information Processing Systems (NeurIPS), 2021
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Edouard Grave
ViT
331
603
0
17 Jun 2021
S
2
^2
2
-MLP: Spatial-Shift MLP Architecture for Vision
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
232
215
0
14 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Neural Information Processing Systems (NeurIPS), 2021
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zinan Lin
ViT
186
252
0
08 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise Convolution
International Conference on Learning Representations (ICLR), 2021
Qi Han
Zejia Fan
Jingdong Sun
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
272
131
0
08 Jun 2021
Fully Transformer Networks for Semantic Image Segmentation
Sitong Wu
Tianyi Wu
Fangjian Lin
Sheng Tian
Guodong Guo
ViT
232
47
0
08 Jun 2021
Vision Transformers with Hierarchical Attention
Machine Intelligence Research (MIR), 2021
Yun-Hai Liu
Yu-Huan Wu
Guolei Sun
Le Zhang
Ajad Chhatkuli
Luc Van Gool
ViT
167
68
0
06 Jun 2021
RegionViT: Regional-to-Local Attention for Vision Transformers
International Conference on Learning Representations (ICLR), 2021
Chun-Fu Chen
Yikang Shen
Quanfu Fan
ViT
398
224
0
04 Jun 2021
Container: Context Aggregation Network
Neural Information Processing Systems (NeurIPS), 2021
Peng Gao
Jiasen Lu
Jiaming Song
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
250
80
0
02 Jun 2021
KVT: k-NN Attention for Boosting Vision Transformers
European Conference on Computer Vision (ECCV), 2021
Pichao Wang
Qingsong Wen
F. Wang
Ming Lin
Shuning Chang
Hao Li
Rong Jin
ViT
229
128
0
28 May 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zizhao Zhang
Han Zhang
Long Zhao
Ting Chen
Sercan O. Arik
Tomas Pfister
ViT
309
202
0
26 May 2021
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
Neural Information Processing Systems (NeurIPS), 2021
Lezhi Li
Shiyu Chang
Zinan Lin
ViT
524
456
0
14 Feb 2021
Transformers in Vision: A Survey
ACM Computing Surveys (CSUR), 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
810
3,092
0
04 Jan 2021
Previous
1
2
3
4