ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.14030
  4. Cited By
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

25 March 2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
B. Guo
    ViT
ArXivPDFHTML

Papers citing "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

50 / 1,659 papers shown
Title
Give Me Your Attention: Dot-Product Attention Considered Harmful for
  Adversarial Patch Robustness
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
17
32
0
25 Mar 2022
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
Xin Chen
Ben Kang
D. Wang
Dongdong Li
Huchuan Lu
ViT
12
48
0
25 Mar 2022
High-Performance Transformer Tracking
High-Performance Transformer Tracking
Xin Chen
B. Yan
Jiawen Zhu
Huchuan Lu
Xiang Ruan
D. Wang
ViT
19
33
0
25 Mar 2022
Facial Expression Recognition with Swin Transformer
Facial Expression Recognition with Swin Transformer
Jun-Hwa Kim
Namho Kim
C. Won
ViT
21
28
0
25 Mar 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the
  Adaptive Stepsize Range
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
11
2
0
24 Mar 2022
BigDetection: A Large-scale Benchmark for Improved Object Detector
  Pre-training
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
Likun Cai
Zhi-Li Zhang
Yi Zhu
Li Zhang
Mu Li
Xiangyang Xue
VLM
ObjD
24
40
0
24 Mar 2022
Facial Expression Classification using Fusion of Deep Neural Network in
  Video for the 3rd ABAW3 Competition
Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition
Kim Ngan Phan
Hong Hai Nguyen
V. Huynh
Soo-Hyung Kim
CVBM
30
12
0
24 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
37
1,114
0
23 Mar 2022
Focal Modulation Networks
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
22
261
0
22 Mar 2022
Meta-attention for ViT-backed Continual Learning
Meta-attention for ViT-backed Continual Learning
Mengqi Xue
Haofei Zhang
Jie Song
Mingli Song
CLL
12
41
0
22 Mar 2022
DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic
  Layouts
DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts
Yidi Li
Yiqun Wang
Zhengda Lu
Jun Xiao
GAN
3DV
MDE
17
3
0
22 Mar 2022
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance
  Fields
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
Yuedong Chen
Qianyi Wu
Chuanxia Zheng
Tat-Jen Cham
Jianfei Cai
45
37
0
21 Mar 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision
  Transformer
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
16
53
0
21 Mar 2022
Harnessing Hard Mixed Samples with Decoupled Regularizer
Harnessing Hard Mixed Samples with Decoupled Regularizer
Zicheng Liu
Siyuan Li
Ge Wang
Cheng Tan
Lirong Wu
Stan Z. Li
51
17
0
21 Mar 2022
Iwin: Human-Object Interaction Detection via Transformer with Irregular
  Windows
Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
Danyang Tu
Xiongkuo Min
Huiyu Duan
G. Guo
Guangtao Zhai
Wei Shen
ViT
22
24
0
20 Mar 2022
MatchFormer: Interleaving Attention in Transformers for Feature Matching
MatchFormer: Interleaving Attention in Transformers for Feature Matching
Qing Wang
Jiaming Zhang
Kailun Yang
Kunyu Peng
Rainer Stiefelhagen
ViT
31
141
0
17 Mar 2022
Towards Data-Efficient Detection Transformers
Towards Data-Efficient Detection Transformers
Wen Wang
Jing Zhang
Yang Cao
Yongliang Shen
Dacheng Tao
ViT
16
56
0
17 Mar 2022
Surgical Workflow Recognition: from Analysis of Challenges to
  Architectural Study
Surgical Workflow Recognition: from Analysis of Challenges to Architectural Study
Tobias Czempiel
Aidean Sharghi
Magdalini Paschali
Nassir Navab
Omid Mohareri
19
8
0
17 Mar 2022
Hyperbolic Uncertainty Aware Semantic Segmentation
Bike Chen
Wei Peng
Xiaofeng Cao
Juha Roning
UQCV
16
15
0
16 Mar 2022
Object discovery and representation networks
Object discovery and representation networks
Olivier J. Hénaff
Skanda Koppula
Evan Shelhamer
Daniel Zoran
Andrew Jaegle
Andrew Zisserman
João Carreira
Relja Arandjelović
33
87
0
16 Mar 2022
EDTER: Edge Detection with Transformer
EDTER: Edge Detection with Transformer
Mengyang Pu
Yaping Huang
Yuming Liu
Q. Guan
Haibin Ling
ViT
9
98
0
16 Mar 2022
PointAttN: You Only Need Attention for Point Cloud Completion
PointAttN: You Only Need Attention for Point Cloud Completion
Jun Wang
Yinghan Cui
Dongyan Guo
Junxia Li
Qingshan Liu
Chunhua Shen
3DPC
14
44
0
16 Mar 2022
HUMUS-Net: Hybrid unrolled multi-scale network architecture for
  accelerated MRI reconstruction
HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction
Zalan Fabian
Berk Tinaz
Mahdi Soltanolkotabi
25
50
0
15 Mar 2022
Style Transformer for Image Inversion and Editing
Style Transformer for Image Inversion and Editing
Xueqi Hu
Qiusheng Huang
Zhengyi Shi
Siyuan Li
Changxin Gao
Li Sun
Qingli Li
25
55
0
15 Mar 2022
Progressive End-to-End Object Detection in Crowded Scenes
Progressive End-to-End Object Detection in Crowded Scenes
Anlin Zheng
Yuang Zhang
X. Zhang
Xiao Qi
Jian-jun Sun
ObjD
19
60
0
15 Mar 2022
RecursiveMix: Mixed Learning with History
RecursiveMix: Mixed Learning with History
Lingfeng Yang
Xiang Li
Borui Zhao
Renjie Song
Jian Yang
VLM
22
18
0
14 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Xiaohan Ding
X. Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian-jun Sun
VLM
43
522
0
13 Mar 2022
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
Heqin Zhu
Xu Sun
Yuexiang Li
Kai Ma
S. Kevin Zhou
Yefeng Zheng
ViT
31
9
0
12 Mar 2022
Joint CNN and Transformer Network via weakly supervised Learning for
  efficient crowd counting
Joint CNN and Transformer Network via weakly supervised Learning for efficient crowd counting
Fusen Wang
Kai Liu
Fei Long
Nong Sang
Xiaofeng Xia
J. Sang
ViT
30
19
0
12 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls
  for Reducing All Levels of Redundancy
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Tianlong Chen
Zhenyu (Allen) Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zhangyang Wang
ViT
20
37
0
12 Mar 2022
PETR: Position Embedding Transformation for Multi-View 3D Object
  Detection
PETR: Position Embedding Transformation for Multi-View 3D Object Detection
Yingfei Liu
Tiancai Wang
X. Zhang
Jian-jun Sun
3DPC
15
523
0
10 Mar 2022
Backbone is All Your Need: A Simplified Architecture for Visual Object
  Tracking
Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
Boyu Chen
Peixia Li
Lei Bai
Leixian Qiao
Qiuhong Shen
Bo-wen Li
Weihao Gan
Wei Wu
Wanli Ouyang
ViT
VOT
20
182
0
10 Mar 2022
Knowledge Distillation as Efficient Pre-training: Faster Convergence,
  Higher Data-efficiency, and Better Transferability
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability
Ruifei He
Shuyang Sun
Jihan Yang
Song Bai
Xiaojuan Qi
14
35
0
10 Mar 2022
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with
  Transformers
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
Jiaming Zhang
Huayao Liu
Kailun Yang
Xinxin Hu
Ruiping Liu
Rainer Stiefelhagen
ViT
21
292
0
09 Mar 2022
Evaluation of YOLO Models with Sliced Inference for Small Object
  Detection
Evaluation of YOLO Models with Sliced Inference for Small Object Detection
Muhammed Can Keles
Batuhan Salmanoglu
M. Güzel
Baran Gursoy
Gazi Erkan Bostancı
ObjD
15
11
0
09 Mar 2022
Region-Aware Face Swapping
Region-Aware Face Swapping
Chao Xu
Jiangning Zhang
Miao Hua
Qian He
Zili Yi
Yong Liu
CVBM
11
48
0
09 Mar 2022
A Simple Multi-Modality Transfer Learning Baseline for Sign Language
  Translation
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen
Fangyun Wei
Xiao Sun
Zhirong Wu
Stephen Lin
SLR
20
94
0
08 Mar 2022
RankSeg: Adaptive Pixel Classification with Image Category Ranking for
  Segmentation
RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation
Hao He
Yuhui Yuan
Xiangyu Yue
Han Hu
VOS
VLM
10
13
0
08 Mar 2022
SpeechFormer: A Hierarchical Efficient Framework Incorporating the
  Characteristics of Speech
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
6
34
0
08 Mar 2022
CrowdFormer: Weakly-supervised Crowd counting with Improved
  Generalizability
CrowdFormer: Weakly-supervised Crowd counting with Improved Generalizability
Siddharth Singh Savner
Vivek Kanhangad
ViT
19
31
0
07 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object
  Detection
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
10
1,358
0
07 Mar 2022
CNN self-attention voice activity detector
CNN self-attention voice activity detector
Amit Sofer
Shlomo E. Chazan
10
8
0
06 Mar 2022
PanFormer: a Transformer Based Model for Pan-sharpening
PanFormer: a Transformer Based Model for Pan-sharpening
Huanyu Zhou
Qingjie Liu
Yunhong Wang
ViT
20
42
0
06 Mar 2022
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Qishuai Diao
Yi-Xin Jiang
Bin Wen
Jianxiang Sun
Zehuan Yuan
17
60
0
05 Mar 2022
Boosting Crowd Counting via Multifaceted Attention
Boosting Crowd Counting via Multifaceted Attention
Hui Lin
Zhiheng Ma
Rongrong Ji
Yaowei Wang
Xiaopeng Hong
23
145
0
05 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
17
159
0
04 Mar 2022
F2DNet: Fast Focal Detection Network for Pedestrian Detection
F2DNet: Fast Focal Detection Network for Pedestrian Detection
Abdul Hannan Khan
Mohsin Munir
L. V. Elst
Andreas Dengel
ObjD
9
24
0
04 Mar 2022
Correlation-Aware Deep Tracking
Correlation-Aware Deep Tracking
Fei Xie
Chunyu Wang
Guangting Wang
Yue Cao
Wankou Yang
Wenjun Zeng
VOT
14
117
0
03 Mar 2022
Recent Advances in Vision Transformer: A Survey and Outlook of Recent
  Work
Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work
Khawar Islam
ViT
24
44
0
03 Mar 2022
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth
  Estimation
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
Weihao Yuan
Xiaodong Gu
Zuozhuo Dai
Siyu Zhu
Ping Tan
23
171
0
03 Mar 2022
Previous
123...293031323334
Next