ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.14030
  4. Cited By
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
v1v2 (latest)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

IEEE International Conference on Computer Vision (ICCV), 2021
25 March 2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
    ViT
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)Github (14835★)

Papers citing "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

50 / 8,525 papers shown
Learning Efficient Vision Transformers via Fine-Grained Manifold
  Distillation
Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
Zhiwei Hao
Jianyuan Guo
Ding Jia
Kai Han
Yehui Tang
Chao Zhang
Dacheng Tao
Yunhe Wang
ViT
442
89
0
03 Jul 2021
1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face
  detection in the low light condition
1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition
Pengcheng Wang
Ling Ji
Zhilong Ji
Yuan Gao
Xiao-Chang Liu
CVBM
106
0
0
02 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with
  Cross-Shaped Windows
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
803
1,244
0
01 Jul 2021
Global Filter Networks for Image Classification
Global Filter Networks for Image Classification
Yongming Rao
Wenliang Zhao
Zheng Zhu
Jiwen Lu
Jie Zhou
ViT
304
611
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision
  Transformers
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
353
502
0
01 Jul 2021
CBNet: A Composite Backbone Network Architecture for Object Detection
CBNet: A Composite Backbone Network Architecture for Object Detection
Tingting Liang
Xiao Chu
Yudong Liu
Yongtao Wang
Zhi Tang
Wei Chu
Jingdong Chen
Haibin Ling
ObjD
555
206
0
01 Jul 2021
Simple Training Strategies and Model Scaling for Object Detection
Simple Training Strategies and Model Scaling for Object Detection
Xianzhi Du
Barret Zoph
Wei-Chih Hung
Nayeon Lee
ObjD
239
50
0
30 Jun 2021
Looking Outside the Window: Wide-Context Transformer for the Semantic
  Segmentation of High-Resolution Remote Sensing Images
Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing ImagesIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2021
L. Ding
Dong Lin
Shaofu Lin
Jing Zhang
Xiaojie Cui
Yuebin Wang
Hao Tang
Lorenzo Bruzzone
ViT
547
130
0
29 Jun 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Rethinking Token-Mixing MLP for MLP-based Vision BackboneBritish Machine Vision Conference (BMVC), 2021
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
197
27
0
28 Jun 2021
Early Convolutions Help Transformers See Better
Early Convolutions Help Transformers See BetterNeural Information Processing Systems (NeurIPS), 2021
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
377
887
0
28 Jun 2021
K-Net: Towards Unified Image Segmentation
K-Net: Towards Unified Image SegmentationNeural Information Processing Systems (NeurIPS), 2021
Wenwei Zhang
Jiangmiao Pang
Kai-xiang Chen
Chen Change Loy
ISeg
334
442
0
28 Jun 2021
R-Drop: Regularized Dropout for Neural Networks
R-Drop: Regularized Dropout for Neural NetworksNeural Information Processing Systems (NeurIPS), 2021
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
303
518
0
28 Jun 2021
Can An Image Classifier Suffice For Action Recognition?
Can An Image Classifier Suffice For Action Recognition?International Conference on Learning Representations (ICLR), 2021
Quanfu Fan
Chun-Fu Chen
Chen
Yikang Shen
ViT
291
38
0
26 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
PVT v2: Improved Baselines with Pyramid Vision TransformerComputational Visual Media (CVM), 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViTAI4TS
791
2,143
0
25 Jun 2021
ViTAS: Vision Transformer Architecture Search
ViTAS: Vision Transformer Architecture SearchEuropean Conference on Computer Vision (ECCV), 2021
Xiu Su
Shan You
Jiyang Xie
Mingkai Zheng
Haiwei Yang
Chao Qian
Changshui Zhang
Xiaogang Wang
Chang Xu
ViT
459
56
0
25 Jun 2021
Probing Inter-modality: Visual Parsing with Self-Attention for
  Vision-Language Pre-training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
414
94
0
25 Jun 2021
Video Swin Transformer
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
495
1,884
0
24 Jun 2021
Exploring Corruption Robustness: Inductive Biases in Vision Transformers
  and MLP-Mixers
Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers
Katelyn Morrison
B. Gilby
Colton Lipchak
Adam Mattioli
Adriana Kovashka
ViT
176
17
0
24 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
424
378
0
24 Jun 2021
Advancing biological super-resolution microscopy through deep learning:
  a brief review
Advancing biological super-resolution microscopy through deep learning: a brief review
Tianjie Yang
Yaoru Luo
Wei Ji
Ge Yang
SupR
175
25
0
24 Jun 2021
Autoformer: Decomposition Transformers with Auto-Correlation for
  Long-Term Series Forecasting
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Haixu Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
AI4TS
518
3,779
0
24 Jun 2021
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
  Transformers
IA-RED2^22: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Lezhi Li
Zinan Lin
Rogerio Feris
A. Oliva
VLMViT
329
191
0
23 Jun 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional
  Encoding
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
211
56
0
23 Jun 2021
Transformer Meets Convolution: A Bilateral Awareness Network for
  Semantic Segmentation of Very Fine Resolution Urban Scene Images
Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images
Libo Wang
Rui Li
Dongzhi Wang
Chenxi Duan
Teng Wang
Xiaoliang Meng
ViT
259
208
0
23 Jun 2021
Vision Permutator: A Permutable MLP-Like Architecture for Visual
  Recognition
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition
Qibin Hou
Zihang Jiang
Li-xin Yuan
Mingg-Ming Cheng
Shuicheng Yan
Jiashi Feng
ViTMLLM
306
236
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
P2T: Pyramid Pooling Transformer for Scene UnderstandingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
613
289
0
22 Jun 2021
Tracking Instances as Queries
Tracking Instances as Queries
Shusheng Yang
Yuxin Fang
Xinggang Wang
Yu Li
Ying Shan
Bin Feng
Wenyu Liu
175
11
0
22 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
649
155
0
21 Jun 2021
SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset
  for Autonomous Driving
SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving
Jianhua Han
Xiwen Liang
Hang Xu
Kai Chen
Lanqing Hong
...
Chao Ye
Wei Zhang
Zhenguo Li
Xi Liang
Chunjing Xu
224
103
0
21 Jun 2021
More than Encoder: Introducing Transformer Decoder to Upsample
More than Encoder: Introducing Transformer Decoder to UpsampleIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Yijiang Li
Wentian Cai
Ying Gao
Chengming Li
Xiping Hu
ViTMedIm
254
75
0
20 Jun 2021
MSN: Efficient Online Mask Selection Network for Video Instance
  Segmentation
MSN: Efficient Online Mask Selection Network for Video Instance Segmentation
Vidit Goel
Jiachen Li
Shubhika Garg
Harsh Maheshwari
Humphrey Shi
231
9
0
19 Jun 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision
  Transformers
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
345
776
0
18 Jun 2021
Efficient Self-supervised Vision Transformers for Representation
  Learning
Efficient Self-supervised Vision Transformers for Representation LearningInternational Conference on Learning Representations (ICLR), 2021
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
306
224
0
17 Jun 2021
XCiT: Cross-Covariance Image Transformers
XCiT: Cross-Covariance Image TransformersNeural Information Processing Systems (NeurIPS), 2021
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Edouard Grave
ViT
446
614
0
17 Jun 2021
Long-Short Temporal Contrastive Learning of Video Transformers
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLMViT
348
56
0
17 Jun 2021
End-to-End Semi-Supervised Object Detection with Soft Teacher
End-to-End Semi-Supervised Object Detection with Soft Teacher
Mengde Xu
Zheng Zhang
Han Hu
Jianfeng Wang
Lijuan Wang
Fangyun Wei
X. Bai
Zicheng Liu
350
586
0
16 Jun 2021
Shuffle Transformer with Feature Alignment for Video Face Parsing
Shuffle Transformer with Feature Alignment for Video Face Parsing
Rui Zhang
Yang Han
Zilong Huang
Pei Cheng
Guozhong Luo
Gang Yu
Bin-Bin Fu
CVBMViT
181
1
0
16 Jun 2021
Temporal Convolution Networks with Positional Encoding for Evoked
  Expression Estimation
Temporal Convolution Networks with Positional Encoding for Evoked Expression Estimation
V. Huynh
Gueesang Lee
Hyung-Jeong Yang
Soohyung Kim
146
4
0
16 Jun 2021
ICDAR 2021 Competition on Components Segmentation Task of Document
  Photos
ICDAR 2021 Competition on Components Segmentation Task of Document Photos
C. A. M. L. Junior
R. B. D. N. Junior
B. Bezerra
Alejandro H. Toselli
D. Impedovo
155
2
0
16 Jun 2021
Dynamic Head: Unifying Object Detection Heads with Attentions
Dynamic Head: Unifying Object Detection Heads with Attentions
Xiyang Dai
Yinpeng Chen
Bin Xiao
Dongdong Chen
Xiyang Dai
Lu Yuan
Lei Zhang
232
803
0
15 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
856
3,424
0
15 Jun 2021
Improved Transformer for High-Resolution GANs
Improved Transformer for High-Resolution GANsNeural Information Processing Systems (NeurIPS), 2021
Long Zhao
Zizhao Zhang
Ting Chen
Dimitris N. Metaxas
Han Zhang
ViT
352
109
0
14 Jun 2021
S$^2$-MLP: Spatial-Shift MLP Architecture for Vision
S2^22-MLP: Spatial-Shift MLP Architecture for VisionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
261
219
0
14 Jun 2021
3rd Place Solution for Short-video Face Parsing Challenge
3rd Place Solution for Short-video Face Parsing Challenge
Xiao Liu
Xiaofei Si
Jiangtao Xie
CVBM
135
0
0
14 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and FutureAI Open (AO), 2021
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFinMQAI4MH
390
995
0
14 Jun 2021
Styleformer: Transformer based Generative Adversarial Networks with
  Style Vector
Styleformer: Transformer based Generative Adversarial Networks with Style VectorComputer Vision and Pattern Recognition (CVPR), 2021
Jeeseung Park
Younggeun Kim
ViT
314
59
0
13 Jun 2021
DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation
DS-TransUNet:Dual Swin Transformer U-Net for Medical Image SegmentationIEEE Transactions on Instrumentation and Measurement (IEEE Trans. Instrum. Meas.), 2021
Ai-Jun Lin
Bingzhi Chen
Jiayu Xu
Zheng Zhang
Guangming Lu
ViTMedIm
287
820
0
12 Jun 2021
1st Place Solution for YouTubeVOS Challenge 2021:Video Instance
  Segmentation
1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation
Thuy C. Nguyen
Tuan N. Tang
N. Phan
Chuong H. Nguyen
Masayuki Yamazaki
Masao Yamanaka
156
6
0
12 Jun 2021
MlTr: Multi-label Classification with Transformer
MlTr: Multi-label Classification with TransformerIEEE International Conference on Multimedia and Expo (ICME), 2021
Xingyi Cheng
Hezheng Lin
Xiangyu Wu
Fan Yang
Dong Shen
Zhongyuan Wang
Nian Shi
Honglin Liu
ViT
176
58
0
11 Jun 2021
Rethinking Architecture Design for Tackling Data Heterogeneity in
  Federated Learning
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated LearningComputer Vision and Pattern Recognition (CVPR), 2021
Liangqiong Qu
Yuyin Zhou
Paul Pu Liang
Yingda Xia
Feifei Wang
Ehsan Adeli
L. Fei-Fei
D. Rubin
FedMLAI4CE
414
216
0
10 Jun 2021
Previous
123...167168169170171
Next
Page 168 of 171
Pageof 171