Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2103.15358
Cited By
v1
v2 (latest)
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (246★)
Papers citing
"Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding"
50 / 197 papers shown
Title
RMT: Retentive Networks Meet Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Qihang Fan
Huaibo Huang
Mingrui Chen
Hongmin Liu
Ran He
ViT
497
159
0
20 Sep 2023
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
European Conference on Computer Vision (ECCV), 2023
Ozan Unal
Daniel Gehrig
Suman Saha
Luc Van Gool
209
27
0
08 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
205
40
0
04 Sep 2023
SG-Former: Self-guided Transformer with Evolving Token Reallocation
IEEE International Conference on Computer Vision (ICCV), 2023
Sucheng Ren
Xingyi Yang
Songhua Liu
Xinchao Wang
ViT
228
61
0
23 Aug 2023
SwinLSTM:Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM
IEEE International Conference on Computer Vision (ICCV), 2023
Song Tang
Chuang Li
Pufen Zhang
R. Tang
AI4TS
132
89
0
19 Aug 2023
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
256
4
0
10 Aug 2023
Scale-Aware Modulation Meet Transformer
IEEE International Conference on Computer Vision (ICCV), 2023
Wei-Shiang Lin
Ziheng Wu
Jiayu Chen
Jun Huang
Lianwen Jin
MoE
ViT
242
119
0
17 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
IEEE International Conference on Computer Vision (ICCV), 2023
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
216
27
0
13 Jul 2023
YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention
Pattern Recognition (Pattern Recogn.), 2023
Raja Sunkara
Tie-Mei Luo
ObjD
111
13
0
12 Jul 2023
Reviving Shift Equivariance in Vision Transformers
Peijian Ding
Davit Soselia
Thomas Armstrong
Jiahao Su
Furong Huang
217
10
0
13 Jun 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
International Conference on Learning Representations (ICLR), 2023
Ali Hatamizadeh
Greg Heinrich
Hongxu Yin
Andrew Tao
J. Álvarez
Jan Kautz
Pavlo Molchanov
ViT
311
103
0
09 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Neural Information Processing Systems (NeurIPS), 2023
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Xiao-Yu Zhang
ViT
403
38
0
01 Jun 2023
Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification
IEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Saisai Ding
Juncheng Li
Jun Wang
Shihui Ying
Jun Shi
ViT
MedIm
149
18
0
25 May 2023
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Hao Zheng
Jinbao Wang
Xiantong Zhen
Hao Chen
Jingkuan Song
Feng Zheng
ViT
126
1
0
17 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
IEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
266
140
0
09 May 2023
SpectFormer: Frequency and Attention is what you need in a Vision Transformer
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Badri N. Patro
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
ViT
156
87
0
13 Apr 2023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
International Conference on Learning Representations (ICLR), 2023
Ziteng Gao
Zhan Tong
Limin Wang
Mike Zheng Shou
143
14
0
07 Apr 2023
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
Computer Vision and Pattern Recognition (CVPR), 2023
Mingyu Ding
Songlin Yang
Lijie Fan
Zhenfang Chen
Z. Chen
Ping Luo
J. Tenenbaum
Chuang Gan
ViT
221
17
0
06 Apr 2023
Rethinking Local Perception in Lightweight Vision Transformer
Qi Fan
Huaibo Huang
Jiyang Guan
Xiao-Yu Zhang
ViT
286
47
0
31 Mar 2023
Towards Understanding the Effect of Pretraining Label Granularity
Guanzhe Hong
Huayu Chen
Ariel Fuxman
Stanley H. Chan
Enming Luo
170
2
0
29 Mar 2023
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Karim Abou Zeid
Jonas Schult
Alexander Hermans
Bastian Leibe
3DPC
157
43
0
29 Mar 2023
Vision Transformer with Quadrangle Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
146
56
0
27 Mar 2023
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Cong Wei
Brendan Duke
R. Jiang
P. Aarabi
Graham W. Taylor
Florian Shkurti
ViT
173
21
0
24 Mar 2023
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Wenxiao Wang
Wei Chen
Qibo Qiu
Long Chen
Boxi Wu
Binbin Lin
Xiaofei He
Wei Liu
180
84
0
13 Mar 2023
Masked Image Modeling with Local Multi-Scale Reconstruction
Computer Vision and Pattern Recognition (CVPR), 2023
Haoqing Wang
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhiwei Deng
Kai Han
157
65
0
09 Mar 2023
FFT-based Dynamic Token Mixer for Vision
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yuki Tatsunami
Masato Taki
267
50
0
07 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Computer Vision and Pattern Recognition (CVPR), 2023
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDL
DiffM
154
54
0
04 Mar 2023
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
339
7
0
16 Feb 2023
Efficient Attention via Control Variates
International Conference on Learning Representations (ICLR), 2023
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
245
21
0
09 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
IEEE transactions on multimedia (IEEE TMM), 2023
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
161
233
0
03 Feb 2023
A Multi-Scale Framework for Out-of-Distribution Detection in Dermoscopic Images
International Conference on Machine Learning for Cyber Security (ICMLCS), 2023
Zhongzheng Huang
Tao Wang
Yuanzheng Cai
Lingyu Liang
140
0
0
18 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
International Conference on Learning Representations (ICLR), 2023
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
196
37
0
05 Jan 2023
Local Learning on Transformers via Feature Reconstruction
P. Pathak
Jingwei Zhang
Dimitris Samaras
ViT
259
6
0
29 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
130
11
0
15 Dec 2022
Video Prediction by Efficient Transformers
Image and Vision Computing (IVC), 2022
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
234
42
0
12 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
International Journal of Computer Vision (IJCV), 2022
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
153
3
0
29 Nov 2022
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
155
5
0
25 Nov 2022
Aggregated Text Transformer for Scene Text Detection
Zhao Zhou
Xiangcheng Du
Yingbin Zheng
Cheng Jin
ViT
154
1
0
25 Nov 2022
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention
IEEE International Conference on Computer Vision (ICCV), 2022
Wenyuan Zeng
Meng Li
Wenjie Xiong
Tong Tong
Wen-jie Lu
Jin Tan
Runsheng Wang
Ru Huang
273
31
0
25 Nov 2022
UperFormer: A Multi-scale Transformer-based Decoder for Semantic Segmentation
IEEE Transactions on Emerging Topics in Computational Intelligence (IEEE TETCI), 2022
Jing Xu
W. Shi
Pan Gao
Zhengwei Wang
Qizhu Li
ViT
83
1
0
25 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
ACM Multimedia (ACM MM), 2022
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
113
33
0
17 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
IEEE International Conference on Computer Vision (ICCV), 2022
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
218
8
0
14 Nov 2022
Demystify Self-Attention in Vision Transformers from a Semantic Perspective: Analysis and Application
Leijie Wu
Song Guo
Yaohong Ding
Junxiao Wang
Wenchao Xu
Richard Yi Da Xu
Jiewei Zhang
106
3
0
13 Nov 2022
Boosting Binary Neural Networks via Dynamic Thresholds Learning
Jiehua Zhang
Xueyang Zhang
Z. Su
Zitong Yu
Yanghe Feng
Xin Lu
M. Pietikäinen
Li Liu
MQ
233
0
0
04 Nov 2022
Attention-based Neural Cellular Automata
Neural Information Processing Systems (NeurIPS), 2022
Mattie Tesfaldet
Derek Nowrouzezahrai
C. Pal
ViT
174
24
0
02 Nov 2022
Grafting Vision Transformers
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
207
3
0
28 Oct 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Neural Information Processing Systems (NeurIPS), 2022
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
190
4
0
27 Oct 2022
Boosting vision transformers for image retrieval
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Chull Hwan Song
Jooyoung Yoon
Shunghyun Choi
Yannis Avrithis
ViT
253
41
0
21 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Neural Information Processing Systems (NeurIPS), 2022
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
224
83
0
12 Oct 2022
Memory transformers for full context and high-resolution 3D Medical Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
MedIm
114
5
0
11 Oct 2022
Previous
1
2
3
4
Next