ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15358
  4. Cited By
Multi-Scale Vision Longformer: A New Vision Transformer for
  High-Resolution Image Encoding
v1v2 (latest)

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
    ViT
ArXiv (abs)PDFHTMLGithub (246★)

Papers citing "Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding"

50 / 197 papers shown
Expediting Large-Scale Vision Transformer for Dense Prediction without
  Fine-tuning
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
259
39
0
03 Oct 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and
  Effective Fusion of Local, Global and Input Features
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
313
143
0
30 Sep 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for
  Visual Recognition
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual RecognitionNeurocomputing (Neurocomputing), 2022
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
162
22
0
31 Aug 2022
ClusTR: Exploring Efficient Self-attention via Clustering for Vision
  Transformers
ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers
Yutong Xie
Jianpeng Zhang
Yong-quan Xia
Anton Van Den Hengel
Qi Wu
186
7
0
28 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted
  Window
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted WindowIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Mocho Go
Hideyuki Tachibana
ViT
166
11
0
24 Aug 2022
Local Perception-Aware Transformer for Aerial Tracking
Local Perception-Aware Transformer for Aerial TrackingIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Changhong Fu
Wei Peng
Sihang Li
Junjie Ye
Ziang Cao
238
15
0
01 Aug 2022
Global-Local Self-Distillation for Visual Representation Learning
Global-Local Self-Distillation for Visual Representation LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tim Lebailly
Tinne Tuytelaars
SSL
138
6
0
29 Jul 2022
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
Convolutional Embedding Makes Hierarchical Vision Transformer StrongerEuropean Conference on Computer Vision (ECCV), 2022
Cong Wang
Hongmin Xu
Xiong Zhang
Li Wang
Zhitong Zheng
Haifeng Liu
ViT
111
29
0
27 Jul 2022
Efficient High-Resolution Deep Learning: A Survey
Efficient High-Resolution Deep Learning: A SurveyACM Computing Surveys (ACM CSUR), 2022
Arian Bakhtiarnia
Qi Zhang
Alexandros Iosifidis
MedIm
363
37
0
26 Jul 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
EleGANt: Exquisite and Locally Editable GAN for Makeup TransferEuropean Conference on Computer Vision (ECCV), 2022
Chenyu Yang
W. He
Yingqing Xu
Yang Gao
DiffM
174
43
0
20 Jul 2022
Vision Transformers: From Semantic Segmentation to Dense Prediction
Vision Transformers: From Semantic Segmentation to Dense PredictionInternational Journal of Computer Vision (IJCV), 2022
Li Zhang
Jiachen Lu
Sixiao Zheng
Xinxuan Zhao
Xiatian Zhu
Yanwei Fu
Tao Xiang
Jianfeng Feng
Philip H. S. Torr
ViT
281
17
0
19 Jul 2022
Efficient Representation Learning via Adaptive Context Pooling
Efficient Representation Learning via Adaptive Context PoolingInternational Conference on Machine Learning (ICML), 2022
Chen Huang
Walter A. Talbott
Navdeep Jaitly
J. Susskind
201
9
0
05 Jul 2022
Softmax-free Linear Transformers
Softmax-free Linear TransformersInternational Journal of Computer Vision (IJCV), 2022
Jiachen Lu
Junge Zhang
Xiatian Zhu
Jianfeng Feng
Tao Xiang
Li Zhang
ViT
219
15
0
05 Jul 2022
SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention
  Mechanisms for Long Sequences
SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long SequencesDesign Automation Conference (DAC), 2022
Guan Shen
Jieru Zhao
Quan Chen
Jingwen Leng
Chong Li
Minyi Guo
277
40
0
29 Jun 2022
Vicinity Vision Transformer
Vicinity Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weixuan Sun
Zhen Qin
Huiyuan Deng
Jianyuan Wang
Yi Zhang
Kaihao Zhang
Nick Barnes
Stan Birchfield
Lingpeng Kong
Yiran Zhong
ViT
225
45
0
21 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision TransformersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
306
36
0
17 Jun 2022
Patch-level Representation Learning for Self-supervised Vision
  Transformers
Patch-level Representation Learning for Self-supervised Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
293
77
0
16 Jun 2022
Scaleformer: Iterative Multi-scale Refining Transformers for Time Series
  Forecasting
Scaleformer: Iterative Multi-scale Refining Transformers for Time Series ForecastingInternational Conference on Learning Representations (ICLR), 2022
Amin Shabani
A. Abdi
Li Meng
Tristan Sylvain
AI4TS
306
93
0
08 Jun 2022
Scaling Vision Transformers to Gigapixel Images via Hierarchical
  Self-Supervised Learning
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised LearningComputer Vision and Pattern Recognition (CVPR), 2022
Richard J. Chen
Chengkuan Chen
Yicong Li
Tiffany Y. Chen
A. Trister
Rahul G. Krishnan
Faisal Mahmood
ViTMedIm
343
594
0
06 Jun 2022
Universal Photometric Stereo Network using Global Lighting Contexts
Universal Photometric Stereo Network using Global Lighting ContextsComputer Vision and Pattern Recognition (CVPR), 2022
Satoshi Ikehata
3DV
130
30
0
06 Jun 2022
EAANet: Efficient Attention Augmented Convolutional Networks
EAANet: Efficient Attention Augmented Convolutional Networks
Runqing Zhang
Tianshu Zhu
70
0
0
03 Jun 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Jingdong Sun
Lingxi Xie
Qi Tian
255
40
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence LearningNeural Information Processing Systems (NeurIPS), 2022
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
492
21
0
30 May 2022
Fast Vision Transformers with HiLo Attention
Fast Vision Transformers with HiLo AttentionNeural Information Processing Systems (NeurIPS), 2022
Zizheng Pan
Jianfei Cai
Bohan Zhuang
448
249
0
26 May 2022
Inception Transformer
Inception TransformerNeural Information Processing Systems (NeurIPS), 2022
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
354
257
0
25 May 2022
ASSET: Autoregressive Semantic Scene Editing with Transformers at High
  Resolutions
ASSET: Autoregressive Semantic Scene Editing with Transformers at High ResolutionsACM Transactions on Graphics (TOG), 2022
Difan Liu
Sandesh Shetty
Tobias Hinz
Matthew Fisher
Richard Y. Zhang
Taesung Park
E. Kalogerakis
ViT
193
42
0
24 May 2022
SCVRL: Shuffled Contrastive Video Representation Learning
SCVRL: Shuffled Contrastive Video Representation Learning
Michael Dorkenwald
Fanyi Xiao
Biagio Brattoli
Joseph Tighe
Davide Modolo
SSL
171
19
0
24 May 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action
  Prediction
The Wisdom of Crowds: Temporal Progressive Attention for Early Action PredictionComputer Vision and Pattern Recognition (CVPR), 2022
Alexandros Stergiou
Dima Damen
AI4TSEgoVEDL
180
14
0
28 Apr 2022
Transformation Invariant Cancerous Tissue Classification Using Spatially
  Transformed DenseNet
Transformation Invariant Cancerous Tissue Classification Using Spatially Transformed DenseNet
Omar Mahdi
Ali Bou Nassif
MedIm
89
3
0
23 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
VSA: Learning Varied-Size Window Attention in Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
234
66
0
18 Apr 2022
Safe Self-Refinement for Transformer-based Domain Adaptation
Safe Self-Refinement for Transformer-based Domain AdaptationComputer Vision and Pattern Recognition (CVPR), 2022
Tao Sun
Cheng Lu
Tianshuo Zhang
Haibin Ling
ViT
194
119
0
16 Apr 2022
Neighborhood Attention Transformer
Neighborhood Attention TransformerComputer Vision and Pattern Recognition (CVPR), 2022
Ali Hassani
Steven Walton
Jiacheng Li
Shengjia Li
Humphrey Shi
ViTAI4TS
414
403
0
14 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Linear Complexity Randomized Self-attention MechanismInternational Conference on Machine Learning (ICML), 2022
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
206
36
0
10 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
382
344
0
07 Apr 2022
Unified Contrastive Learning in Image-Text-Label Space
Unified Contrastive Learning in Image-Text-Label SpaceComputer Vision and Pattern Recognition (CVPR), 2022
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLMSSL
311
273
0
07 Apr 2022
Multi-scale Context-aware Network with Transformer for Gait Recognition
Multi-scale Context-aware Network with Transformer for Gait Recognition
Duo-Lin Zhu
Xiaohui Huang
Xinggang Wang
Bo Yang
Botao He
Wenyu Liu
Bin Feng
ViTCVBM
288
17
0
07 Apr 2022
End-to-End Instance Edge Detection
End-to-End Instance Edge Detection
Xueyan Zou
Haotian Liu
Yong Jae Lee
147
2
0
06 Apr 2022
VPTR: Efficient Transformers for Video Prediction
VPTR: Efficient Transformers for Video PredictionInternational Conference on Pattern Recognition (ICPR), 2022
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
242
28
0
29 Mar 2022
Towards Spatio-Temporal Aware Traffic Time Series Forecasting--Full
  Version
Towards Spatio-Temporal Aware Traffic Time Series Forecasting--Full VersionIEEE International Conference on Data Engineering (ICDE), 2022
Razvan-Gabriel Cirstea
B. Yang
Chenjuan Guo
Tung Kieu
Shirui Pan
AI4TS
327
117
0
29 Mar 2022
MatteFormer: Transformer-Based Image Matting via Prior-Tokens
MatteFormer: Transformer-Based Image Matting via Prior-TokensComputer Vision and Pattern Recognition (CVPR), 2022
Gyutae Park
S. Son
Jaeyoung Yoo
Seho Kim
Nojun Kwak
ViT
240
86
0
29 Mar 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
Fan Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViTMedIm
218
35
0
24 Mar 2022
Beyond Fixation: Dynamic Window Visual Transformer
Beyond Fixation: Dynamic Window Visual TransformerComputer Vision and Pattern Recognition (CVPR), 2022
Pengzhen Ren
Changlin Li
Guangrun Wang
Yun Xiao
Qing Du
Xiaodan Liang
Qing Du Xiaodan Liang Xiaojun Chang
ViT
195
45
0
24 Mar 2022
Focal Modulation Networks
Focal Modulation NetworksNeural Information Processing Systems (NeurIPS), 2022
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
352
384
0
22 Mar 2022
EDTER: Edge Detection with Transformer
EDTER: Edge Detection with TransformerComputer Vision and Pattern Recognition (CVPR), 2022
Mengyang Pu
Yaping Huang
Yuming Liu
Q. Guan
Haibin Ling
ViT
280
131
0
16 Mar 2022
Dynamic Group Transformer: A General Vision Transformer Backbone with
  Dynamic Group Attention
Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group AttentionInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Kai Liu
Tianyi Wu
Cong Liu
Guodong Guo
ViT
296
20
0
08 Mar 2022
Boosting Crowd Counting via Multifaceted Attention
Boosting Crowd Counting via Multifaceted AttentionComputer Vision and Pattern Recognition (CVPR), 2022
Hui Lin
Zhiheng Ma
Rongrong Ji
Yaowei Wang
Xiaopeng Hong
214
200
0
05 Mar 2022
Auto-scaling Vision Transformers without Training
Auto-scaling Vision Transformers without TrainingInternational Conference on Learning Representations (ICLR), 2022
Wuyang Chen
Wei-Ping Huang
Xianzhi Du
Xiaodan Song
Zinan Lin
Denny Zhou
ViT
150
27
0
24 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text SupervisionComputer Vision and Pattern Recognition (CVPR), 2022
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViTVLM
760
633
0
22 Feb 2022
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for
  Visual Discrimination
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for Visual Discrimination
Qingsong Zhao
Shuguang Dou
Zhipeng Zhou
Yangguang Li
Yin Wang
Yu Qiao
Cairong Zhao
245
0
0
21 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and BeyondInternational Journal of Computer Vision (IJCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
287
274
0
21 Feb 2022
Previous
1234
Next
Page 3 of 4