Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.14030
Cited By
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
25 March 2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
B. Guo
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"
50 / 2,048 papers shown
Title
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
8
88
0
31 Jan 2022
VRT: A Video Restoration Transformer
Jingyun Liang
Jiezhang Cao
Yuchen Fan
K. Zhang
Rakesh Ranjan
Yawei Li
Radu Timofte
Luc Van Gool
ViT
21
251
0
28 Jan 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
27
43
0
28 Jan 2022
You Only Cut Once: Boosting Data Augmentation with a Single Cut
Junlin Han
Pengfei Fang
Weihong Li
Jie Hong
M. Armin
Ian Reid
L. Petersson
Hongdong Li
25
27
0
28 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
142
361
0
24 Jan 2022
PETS-SWINF: A regression method that considers images with metadata based Neural Network for pawpularity prediction on 2021 Kaggle Competition "PetFinder.my"
Yizheng Wang
Yinghua Liu
18
2
0
16 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
102
0
16 Jan 2022
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?
Nenad Tomašev
Ioana Bica
Brian McWilliams
Lars Buesing
Razvan Pascanu
Charles Blundell
Jovana Mitrović
SSL
66
80
0
13 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
31
235
0
12 Jan 2022
Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution
Zixu Zhuang
Liping Si
Sheng Wang
Kai Xuan
Xi Ouyang
...
Zhong Xue
Lichi Zhang
D. Shen
Weiwu Yao
Qian Wang
35
5
0
12 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
24
211
0
12 Jan 2022
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
40
4,945
0
10 Jan 2022
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
157
156
0
08 Jan 2022
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images
Ali Hatamizadeh
V. Nath
Yucheng Tang
Dong Yang
H. Roth
Daguang Xu
ViT
MedIm
17
1,037
0
04 Jan 2022
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture
Kai Han
Jianyuan Guo
Yehui Tang
Yunhe Wang
ViT
24
22
0
04 Jan 2022
RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark
Zhuo Deng
Yuanhao Cai
Lu Chen
Zheng Gong
Qiqi Bao
Xue Yao
D. Fang
Shaochong Zhang
Lan Ma
ViT
MedIm
22
53
0
03 Jan 2022
Robust Region Feature Synthesizer for Zero-Shot Object Detection
Peiliang Huang
Junwei Han
De-Chun Cheng
Dingwen Zhang
ObjD
26
39
0
01 Jan 2022
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
23
70
0
28 Dec 2021
Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction
Jing Zhang
Jianwen Xie
Nick Barnes
Ping Li
ViT
35
90
0
27 Dec 2021
Vision Transformer for Small-Size Datasets
Seung Hoon Lee
Seunghyun Lee
B. Song
ViT
8
222
0
27 Dec 2021
Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression
Zongyu Guo
Runsen Feng
Zhizheng Zhang
Xin Jin
Zhibo Chen
19
15
0
26 Dec 2021
Raw Produce Quality Detection with Shifted Window Self-Attention
Oh Joon Kwon
Byungsoo Kim
Youngduck Choi
ViT
22
0
0
24 Dec 2021
SeMask: Semantically Masked Transformers for Semantic Segmentation
Jitesh Jain
Anukriti Singh
Nikita Orlov
Zilong Huang
Jiachen Li
Steven Walton
Humphrey Shi
ViT
24
92
0
23 Dec 2021
iSegFormer: Interactive Segmentation via Transformers with Application to 3D Knee MR Images
Qin Liu
Zhenlin Xu
Yining Jiao
Marc Niethammer
ViT
MedIm
34
35
0
21 Dec 2021
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
Xiaohan Ding
Honghao Chen
X. Zhang
Jungong Han
Guiguang Ding
14
70
0
21 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
13
243
0
21 Dec 2021
3D Instance Segmentation of MVS Buildings
Jiazhou Chen
Yanghui Xu
Shufang Lu
Ronghua Liang
Liangliang Nan
ISeg
3DV
16
23
0
18 Dec 2021
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Wuyang Chen
Xianzhi Du
Fan Yang
Lucas Beyer
Xiaohua Zhai
...
Huizhong Chen
Jing Li
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
21
20
0
17 Dec 2021
Efficient Visual Tracking with Exemplar Transformers
Philippe Blatter
Menelaos Kanakis
Martin Danelljan
Luc Van Gool
ViT
16
79
0
17 Dec 2021
Towards End-to-End Image Compression and Analysis with Transformers
Yuanchao Bai
Xu Yang
Xianming Liu
Junjun Jiang
Yaowei Wang
Xiangyang Ji
Wen Gao
ViT
26
51
0
17 Dec 2021
HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images
A. Athar
Jonathon Luiten
Alexander Hermans
Deva Ramanan
Bastian Leibe
VOS
22
25
0
16 Dec 2021
CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data
Qi Yan
Jianhao Zheng
Simon Reding
Shanci Li
I. Doytchinov
30
20
0
16 Dec 2021
Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation
Yi Zhou
Hui Zhang
Hana Lee
Shuyang Sun
Pingjun Li
Yangguang Zhu
ByungIn Yoo
Xiaojuan Qi
Jae-Joon Han
VOS
25
26
0
16 Dec 2021
Deep Hash Distillation for Image Retrieval
Young Kyun Jang
Geonmo Gu
ByungSoo Ko
Isaac Kang
N. Cho
19
34
0
16 Dec 2021
QAHOI: Query-Based Anchors for Human-Object Interaction Detection
Junwen Chen
Keiji Yanai
18
40
0
16 Dec 2021
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
20
54
0
14 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
21
21
0
09 Dec 2021
Recurrent Glimpse-based Decoder for Detection with Transformer
Zhe Chen
Jing Zhang
Dacheng Tao
ViT
19
30
0
09 Dec 2021
Improving Image Restoration by Revisiting Global Information Aggregation
Xiaojie Chu
Liangyu Chen
Chengpeng Chen
Xin Lu
17
87
0
08 Dec 2021
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
Linghui Meng
Muning Wen
Yaodong Yang
Chenyang Le
Xiyun Li
Weinan Zhang
Ying Wen
Haifeng Zhang
Jun Wang
Bo Xu
OffRL
14
38
0
06 Dec 2021
Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook
S. Javed
Martin Danelljan
F. Khan
Muhammad Haris Khan
M. Felsberg
Jirí Matas
32
126
0
06 Dec 2021
Learning Tracking Representations via Dual-Branch Fully Transformer Networks
Fei Xie
Chunyu Wang
Guangting Wang
Wankou Yang
Wenjun Zeng
ViT
14
47
0
05 Dec 2021
U2-Former: A Nested U-shaped Transformer for Image Restoration
Haobo Ji
Xin Feng
Wenjie Pei
Jinxing Li
Guangming Lu
ViT
22
26
0
04 Dec 2021
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
25
203
0
02 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
A. Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
31
2,256
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
46
676
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
36
128
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
61
550
0
02 Dec 2021
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
F. Khan
Michael S. Ryoo
ViT
24
84
0
02 Dec 2021
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
Liting Lin
Heng Fan
Zhipeng Zhang
Yong-mei Xu
Haibin Ling
ViT
12
301
0
02 Dec 2021
Previous
1
2
3
...
37
38
39
40
41
Next