ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.14030
  4. Cited By
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

25 March 2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
B. Guo
    ViT
ArXivPDFHTML

Papers citing "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

50 / 2,048 papers shown
Title
Representation Compensation Networks for Continual Semantic Segmentation
Representation Compensation Networks for Continual Semantic Segmentation
Chang-Bin Zhang
Jianqiang Xiao
Xialei Liu
Ying-Cong Chen
Mingg-Ming Cheng
SSeg
CLL
16
93
0
10 Mar 2022
Backbone is All Your Need: A Simplified Architecture for Visual Object
  Tracking
Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
Boyu Chen
Peixia Li
Lei Bai
Leixian Qiao
Qiuhong Shen
Bo-wen Li
Weihao Gan
Wei Wu
Wanli Ouyang
ViT
VOT
20
182
0
10 Mar 2022
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive
  Recognition of Cereal Grains
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains
Lei Fan
Yiwen Ding
Dongdong Fan
Donglin Di
M. Pagnucco
Yang Song
AI4TS
14
19
0
10 Mar 2022
Knowledge Distillation as Efficient Pre-training: Faster Convergence,
  Higher Data-efficiency, and Better Transferability
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability
Ruifei He
Shuyang Sun
Jihan Yang
Song Bai
Xiaojuan Qi
19
35
0
10 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain
  Analysis: From Theory to Practice
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zhangyang Wang
ViT
9
127
0
09 Mar 2022
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with
  Transformers
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
Jiaming Zhang
Huayao Liu
Kailun Yang
Xinxin Hu
Ruiping Liu
Rainer Stiefelhagen
ViT
21
295
0
09 Mar 2022
Evaluation of YOLO Models with Sliced Inference for Small Object
  Detection
Evaluation of YOLO Models with Sliced Inference for Small Object Detection
Muhammed Can Keles
Batuhan Salmanoglu
M. Güzel
Baran Gursoy
Gazi Erkan Bostancı
ObjD
15
11
0
09 Mar 2022
Region-Aware Face Swapping
Region-Aware Face Swapping
Chao Xu
Jiangning Zhang
Miao Hua
Qian He
Zili Yi
Yong Liu
CVBM
17
48
0
09 Mar 2022
A Simple Multi-Modality Transfer Learning Baseline for Sign Language
  Translation
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen
Fangyun Wei
Xiao Sun
Zhirong Wu
Stephen Lin
SLR
20
97
0
08 Mar 2022
RankSeg: Adaptive Pixel Classification with Image Category Ranking for
  Segmentation
RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation
Hao He
Yuhui Yuan
Xiangyu Yue
Han Hu
VOS
VLM
19
13
0
08 Mar 2022
DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set
  Feature Extraction
DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature Extraction
Jiajun Fei
Ziyu Zhu
Wenlei Liu
Zhidong Deng
Mingyang Li
Huanjun Deng
Shuo Zhang
3DPC
8
6
0
08 Mar 2022
Generating 3D Bio-Printable Patches Using Wound Segmentation and
  Reconstruction to Treat Diabetic Foot Ulcers
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers
H. Chae
Seunghwan Lee
H. Son
Seungjae Han
T. Lim
MedIm
20
3
0
08 Mar 2022
SpeechFormer: A Hierarchical Efficient Framework Incorporating the
  Characteristics of Speech
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
17
34
0
08 Mar 2022
CrowdFormer: Weakly-supervised Crowd counting with Improved
  Generalizability
CrowdFormer: Weakly-supervised Crowd counting with Improved Generalizability
Siddharth Singh Savner
Vivek Kanhangad
ViT
19
31
0
07 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object
  Detection
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
34
1,367
0
07 Mar 2022
CNN self-attention voice activity detector
CNN self-attention voice activity detector
Amit Sofer
Shlomo E. Chazan
10
8
0
06 Mar 2022
PanFormer: a Transformer Based Model for Pan-sharpening
PanFormer: a Transformer Based Model for Pan-sharpening
Huanyu Zhou
Qingjie Liu
Yunhong Wang
ViT
20
42
0
06 Mar 2022
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Qishuai Diao
Yi-Xin Jiang
Bin Wen
Jianxiang Sun
Zehuan Yuan
22
60
0
05 Mar 2022
Boosting Crowd Counting via Multifaceted Attention
Boosting Crowd Counting via Multifaceted Attention
Hui Lin
Zhiheng Ma
Rongrong Ji
Yaowei Wang
Xiaopeng Hong
23
145
0
05 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
22
159
0
04 Mar 2022
F2DNet: Fast Focal Detection Network for Pedestrian Detection
F2DNet: Fast Focal Detection Network for Pedestrian Detection
Abdul Hannan Khan
Mohsin Munir
L. V. Elst
Andreas Dengel
ObjD
14
24
0
04 Mar 2022
LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware
  Transformer Network
LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network
Zhigang Jiang
Zhongzheng Xiang
Jinhua Xu
Mingbi Zhao
ViT
3DV
11
34
0
03 Mar 2022
Correlation-Aware Deep Tracking
Correlation-Aware Deep Tracking
Fei Xie
Chunyu Wang
Guangting Wang
Yue Cao
Wankou Yang
Wenjun Zeng
VOT
19
118
0
03 Mar 2022
Recent Advances in Vision Transformer: A Survey and Outlook of Recent
  Work
Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work
Khawar Islam
ViT
24
44
0
03 Mar 2022
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth
  Estimation
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
Weihao Yuan
Xiaodong Gu
Zuozhuo Dai
Siyu Zhu
Ping Tan
23
172
0
03 Mar 2022
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic
  Semantic Segmentation
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation
Jiaming Zhang
Kailun Yang
Chaoxiang Ma
Simon Reiß
Kunyu Peng
Rainer Stiefelhagen
ViT
22
72
0
02 Mar 2022
A Unified Query-based Paradigm for Point Cloud Understanding
A Unified Query-based Paradigm for Point Cloud Understanding
Zetong Yang
Li Jiang
Yanan Sun
Bernt Schiele
Jiaya Jia
3DPC
16
38
0
02 Mar 2022
What Makes Transfer Learning Work For Medical Images: Feature Reuse &
  Other Factors
What Makes Transfer Learning Work For Medical Images: Feature Reuse & Other Factors
Christos Matsoukas
Johan Fredin Haslum
Moein Sorkhei
Magnus P Soderberg
Kevin Smith
VLM
OOD
MedIm
22
84
0
02 Mar 2022
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Zhaozheng Chen
Tan Wang
Xiongwei Wu
Xiansheng Hua
Hanwang Zhang
Qianru Sun
WSOL
VLM
18
142
0
02 Mar 2022
TransDARC: Transformer-based Driver Activity Recognition with Latent
  Space Feature Calibration
TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
ViT
27
32
0
02 Mar 2022
Recent, rapid advancement in visual question answering architecture: a
  review
Recent, rapid advancement in visual question answering architecture: a review
V. Kodali
Daniel Berleant
27
9
0
02 Mar 2022
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
Dening Lu
Qian Xie
Linlin Xu
Jonathan Li
3DV
16
67
0
02 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary
  Detection
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
39
14
0
01 Mar 2022
Spatio-temporal Vision Transformer for Super-resolution Microscopy
Spatio-temporal Vision Transformer for Super-resolution Microscopy
Charles N Christensen
M. Lu
Edward N. Ward
Pietro Lio'
C. Kaminski
19
8
0
28 Feb 2022
SUNet: Swin Transformer UNet for Image Denoising
SUNet: Swin Transformer UNet for Image Denoising
Chi-Mao Fan
Tsung-Jung Liu
Kuan-Hsien Liu
ViT
27
111
0
28 Feb 2022
Real-World Blind Super-Resolution via Feature Matching with Implicit
  High-Resolution Priors
Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors
Chaofeng Chen
Xinyu Shi
Yipeng Qin
Xiaoming Li
Xiaoguang Han
Taojiannan Yang
Shihui Guo
17
113
0
26 Feb 2022
Provable Stochastic Optimization for Global Contrastive Learning: Small
  Batch Does Not Harm Performance
Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance
Zhuoning Yuan
Yuexin Wu
Zi-qi Qiu
Xianzhi Du
Lijun Zhang
Denny Zhou
Tianbao Yang
25
26
0
24 Feb 2022
Factorizer: A Scalable Interpretable Approach to Context Modeling for
  Medical Image Segmentation
Factorizer: A Scalable Interpretable Approach to Context Modeling for Medical Image Segmentation
Pooya Ashtari
Diana Sima
L. De Lathauwer
D. Sappey-Marinier
F. Maes
Sabine Van Huffel
ViT
MedIm
17
35
0
24 Feb 2022
Delving Deep into One-Shot Skeleton-based Action Recognition with
  Diverse Occlusions
Delving Deep into One-Shot Skeleton-based Action Recognition with Diverse Occlusions
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
ViT
19
28
0
23 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
180
499
0
22 Feb 2022
cosFormer: Rethinking Softmax in Attention
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
21
211
0
17 Feb 2022
ScoreNet: Learning Non-Uniform Attention and Augmentation for
  Transformer-Based Histopathological Image Classification
ScoreNet: Learning Non-Uniform Attention and Augmentation for Transformer-Based Histopathological Image Classification
Thomas Stegmüller
Behzad Bozorgtabar
A. Spahr
Jean-Philippe Thiran
ViT
MedIm
19
42
0
15 Feb 2022
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
Seokju Cho
Sunghwan Hong
Seung Wook Kim
ViT
19
34
0
14 Feb 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training
  Benchmark
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
32
86
0
14 Feb 2022
How Do Vision Transformers Work?
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
25
463
0
14 Feb 2022
Particle Transformer for Jet Tagging
Particle Transformer for Jet Tagging
H. Qu
Congqiao Li
Sitian Qian
ViT
MedIm
17
96
0
08 Feb 2022
GLPanoDepth: Global-to-Local Panoramic Depth Estimation
GLPanoDepth: Global-to-Local Panoramic Depth Estimation
Jia-Chi Bai
Shuichang Lai
Haoyu Qin
Jie Guo
Yanwen Guo
ViT
MDE
56
21
0
06 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
19
54
0
04 Feb 2022
Image-to-Image MLP-mixer for Image Reconstruction
Image-to-Image MLP-mixer for Image Reconstruction
Youssef Mansour
Kang Lin
Reinhard Heckel
SupR
20
14
0
04 Feb 2022
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
  Augmentations
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations
Amin Ghiasi
Hamid Kazemi
Steven Reich
Chen Zhu
Micah Goldblum
Tom Goldstein
29
15
0
31 Jan 2022
Previous
123...363738394041
Next