Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.14030
Cited By
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
25 March 2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
B. Guo
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"
50 / 1,659 papers shown
Title
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
17
32
0
25 Mar 2022
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
Xin Chen
Ben Kang
D. Wang
Dongdong Li
Huchuan Lu
ViT
12
48
0
25 Mar 2022
High-Performance Transformer Tracking
Xin Chen
B. Yan
Jiawen Zhu
Huchuan Lu
Xiang Ruan
D. Wang
ViT
19
33
0
25 Mar 2022
Facial Expression Recognition with Swin Transformer
Jun-Hwa Kim
Namho Kim
C. Won
ViT
21
28
0
25 Mar 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
11
2
0
24 Mar 2022
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
Likun Cai
Zhi-Li Zhang
Yi Zhu
Li Zhang
Mu Li
Xiangyang Xue
VLM
ObjD
24
40
0
24 Mar 2022
Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition
Kim Ngan Phan
Hong Hai Nguyen
V. Huynh
Soo-Hyung Kim
CVBM
30
12
0
24 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
37
1,114
0
23 Mar 2022
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
22
261
0
22 Mar 2022
Meta-attention for ViT-backed Continual Learning
Mengqi Xue
Haofei Zhang
Jie Song
Mingli Song
CLL
12
41
0
22 Mar 2022
DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts
Yidi Li
Yiqun Wang
Zhengda Lu
Jun Xiao
GAN
3DV
MDE
17
3
0
22 Mar 2022
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
Yuedong Chen
Qianyi Wu
Chuanxia Zheng
Tat-Jen Cham
Jianfei Cai
45
37
0
21 Mar 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
16
53
0
21 Mar 2022
Harnessing Hard Mixed Samples with Decoupled Regularizer
Zicheng Liu
Siyuan Li
Ge Wang
Cheng Tan
Lirong Wu
Stan Z. Li
51
17
0
21 Mar 2022
Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
Danyang Tu
Xiongkuo Min
Huiyu Duan
G. Guo
Guangtao Zhai
Wei Shen
ViT
22
24
0
20 Mar 2022
MatchFormer: Interleaving Attention in Transformers for Feature Matching
Qing Wang
Jiaming Zhang
Kailun Yang
Kunyu Peng
Rainer Stiefelhagen
ViT
31
141
0
17 Mar 2022
Towards Data-Efficient Detection Transformers
Wen Wang
Jing Zhang
Yang Cao
Yongliang Shen
Dacheng Tao
ViT
16
56
0
17 Mar 2022
Surgical Workflow Recognition: from Analysis of Challenges to Architectural Study
Tobias Czempiel
Aidean Sharghi
Magdalini Paschali
Nassir Navab
Omid Mohareri
19
8
0
17 Mar 2022
Hyperbolic Uncertainty Aware Semantic Segmentation
Bike Chen
Wei Peng
Xiaofeng Cao
Juha Roning
UQCV
16
15
0
16 Mar 2022
Object discovery and representation networks
Olivier J. Hénaff
Skanda Koppula
Evan Shelhamer
Daniel Zoran
Andrew Jaegle
Andrew Zisserman
João Carreira
Relja Arandjelović
33
87
0
16 Mar 2022
EDTER: Edge Detection with Transformer
Mengyang Pu
Yaping Huang
Yuming Liu
Q. Guan
Haibin Ling
ViT
9
98
0
16 Mar 2022
PointAttN: You Only Need Attention for Point Cloud Completion
Jun Wang
Yinghan Cui
Dongyan Guo
Junxia Li
Qingshan Liu
Chunhua Shen
3DPC
14
44
0
16 Mar 2022
HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction
Zalan Fabian
Berk Tinaz
Mahdi Soltanolkotabi
25
50
0
15 Mar 2022
Style Transformer for Image Inversion and Editing
Xueqi Hu
Qiusheng Huang
Zhengyi Shi
Siyuan Li
Changxin Gao
Li Sun
Qingli Li
25
55
0
15 Mar 2022
Progressive End-to-End Object Detection in Crowded Scenes
Anlin Zheng
Yuang Zhang
X. Zhang
Xiao Qi
Jian-jun Sun
ObjD
19
60
0
15 Mar 2022
RecursiveMix: Mixed Learning with History
Lingfeng Yang
Xiang Li
Borui Zhao
Renjie Song
Jian Yang
VLM
22
18
0
14 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Xiaohan Ding
X. Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian-jun Sun
VLM
43
522
0
13 Mar 2022
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
Heqin Zhu
Xu Sun
Yuexiang Li
Kai Ma
S. Kevin Zhou
Yefeng Zheng
ViT
31
9
0
12 Mar 2022
Joint CNN and Transformer Network via weakly supervised Learning for efficient crowd counting
Fusen Wang
Kai Liu
Fei Long
Nong Sang
Xiaofeng Xia
J. Sang
ViT
30
19
0
12 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Tianlong Chen
Zhenyu (Allen) Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zhangyang Wang
ViT
20
37
0
12 Mar 2022
PETR: Position Embedding Transformation for Multi-View 3D Object Detection
Yingfei Liu
Tiancai Wang
X. Zhang
Jian-jun Sun
3DPC
15
523
0
10 Mar 2022
Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
Boyu Chen
Peixia Li
Lei Bai
Leixian Qiao
Qiuhong Shen
Bo-wen Li
Weihao Gan
Wei Wu
Wanli Ouyang
ViT
VOT
20
182
0
10 Mar 2022
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability
Ruifei He
Shuyang Sun
Jihan Yang
Song Bai
Xiaojuan Qi
14
35
0
10 Mar 2022
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
Jiaming Zhang
Huayao Liu
Kailun Yang
Xinxin Hu
Ruiping Liu
Rainer Stiefelhagen
ViT
21
292
0
09 Mar 2022
Evaluation of YOLO Models with Sliced Inference for Small Object Detection
Muhammed Can Keles
Batuhan Salmanoglu
M. Güzel
Baran Gursoy
Gazi Erkan Bostancı
ObjD
15
11
0
09 Mar 2022
Region-Aware Face Swapping
Chao Xu
Jiangning Zhang
Miao Hua
Qian He
Zili Yi
Yong Liu
CVBM
11
48
0
09 Mar 2022
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen
Fangyun Wei
Xiao Sun
Zhirong Wu
Stephen Lin
SLR
20
94
0
08 Mar 2022
RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation
Hao He
Yuhui Yuan
Xiangyu Yue
Han Hu
VOS
VLM
10
13
0
08 Mar 2022
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
6
34
0
08 Mar 2022
CrowdFormer: Weakly-supervised Crowd counting with Improved Generalizability
Siddharth Singh Savner
Vivek Kanhangad
ViT
19
31
0
07 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
10
1,358
0
07 Mar 2022
CNN self-attention voice activity detector
Amit Sofer
Shlomo E. Chazan
10
8
0
06 Mar 2022
PanFormer: a Transformer Based Model for Pan-sharpening
Huanyu Zhou
Qingjie Liu
Yunhong Wang
ViT
20
42
0
06 Mar 2022
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Qishuai Diao
Yi-Xin Jiang
Bin Wen
Jianxiang Sun
Zehuan Yuan
17
60
0
05 Mar 2022
Boosting Crowd Counting via Multifaceted Attention
Hui Lin
Zhiheng Ma
Rongrong Ji
Yaowei Wang
Xiaopeng Hong
23
145
0
05 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
17
159
0
04 Mar 2022
F2DNet: Fast Focal Detection Network for Pedestrian Detection
Abdul Hannan Khan
Mohsin Munir
L. V. Elst
Andreas Dengel
ObjD
9
24
0
04 Mar 2022
Correlation-Aware Deep Tracking
Fei Xie
Chunyu Wang
Guangting Wang
Yue Cao
Wankou Yang
Wenjun Zeng
VOT
14
117
0
03 Mar 2022
Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work
Khawar Islam
ViT
24
44
0
03 Mar 2022
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
Weihao Yuan
Xiaodong Gu
Zuozhuo Dai
Siyu Zhu
Ping Tan
23
171
0
03 Mar 2022
Previous
1
2
3
...
29
30
31
32
33
34
Next