ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09883
  4. Cited By
Swin Transformer V2: Scaling Up Capacity and Resolution
v1v2 (latest)

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
    ViT
ArXiv (abs)PDFHTMLGithub (14834★)

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

31 / 931 papers shown
Title
Diverse Imagenet Models Transfer Better
Diverse Imagenet Models Transfer Better
Niv Nayman
A. Golbert
Asaf Noy
Tan Ping
Lihi Zelnik-Manor
151
0
0
19 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
VSA: Learning Varied-Size Window Attention in Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
210
65
0
18 Apr 2022
ResT V2: Simpler, Faster and Stronger
ResT V2: Simpler, Faster and StrongerNeural Information Processing Systems (NeurIPS), 2022
Qing-Long Zhang
Yubin Yang
ViT
234
29
0
15 Apr 2022
S4OD: Semi-Supervised learning for Single-Stage Object Detection
S4OD: Semi-Supervised learning for Single-Stage Object Detection
Yueming Zhang
Xingxu Yao
Chao-Jung Liu
F. Chen
Xiaolin Song
Tengfei Xing
Runbo Hu
Hua Chai
Pengfei Xu
Guoshan Zhang
ObjD
141
7
0
09 Apr 2022
PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model
PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model
Juncai Peng
Lu Dong
Shiyu Tang
Yuying Hao
Lutao Chu
...
Baohua Lai
Qiwen Liu
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
SSegVLM
183
196
0
06 Apr 2022
Exploring Plain Vision Transformer Backbones for Object Detection
Exploring Plain Vision Transformer Backbones for Object DetectionEuropean Conference on Computer Vision (ECCV), 2022
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
595
1,021
0
30 Mar 2022
Focal Modulation Networks
Focal Modulation NetworksNeural Information Processing Systems (NeurIPS), 2022
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
289
373
0
22 Mar 2022
GroupTransNet: Group Transformer Network for RGB-D Salient Object
  Detection
GroupTransNet: Group Transformer Network for RGB-D Salient Object DetectionNeurocomputing (Neurocomputing), 2022
Xian Fang
Jin-lei Zhu
Xiuli Shao
Hongpeng Wang
ViT
214
20
0
21 Mar 2022
simCrossTrans: A Simple Cross-Modality Transfer Learning for Object
  Detection with ConvNets or Vision Transformers
simCrossTrans: A Simple Cross-Modality Transfer Learning for Object Detection with ConvNets or Vision Transformers
Xiaoke Shen
I. Stamos
ViT
119
5
0
20 Mar 2022
Open Set Recognition using Vision Transformer with an Additional
  Detection Head
Open Set Recognition using Vision Transformer with an Additional Detection Head
Feiyang Cai
Zhenkai Zhang
Jie Liu
X. Koutsoukos
ViT
90
8
0
16 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNsComputer Vision and Pattern Recognition (CVPR), 2022
Xiaohan Ding
Xinming Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian Sun
VLM
313
662
0
13 Mar 2022
Active Token Mixer
Active Token MixerAAAI Conference on Artificial Intelligence (AAAI), 2022
Guoqiang Wei
Zhizheng Zhang
Cuiling Lan
Yan Lu
Zhibo Chen
176
22
0
11 Mar 2022
YouTube-GDD: A challenging gun detection dataset with rich contextual
  information
YouTube-GDD: A challenging gun detection dataset with rich contextual information
Yongxiang Gu
Xingbin Liao
Xiaolin Qin
76
11
0
08 Mar 2022
Dynamic Group Transformer: A General Vision Transformer Backbone with
  Dynamic Group Attention
Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group AttentionInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Kai Liu
Tianyi Wu
Cong Liu
Guodong Guo
ViT
230
20
0
08 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object
  Detection
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object DetectionInternational Conference on Learning Representations (ICLR), 2022
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
661
2,140
0
07 Mar 2022
Fast Neural Architecture Search for Lightweight Dense Prediction
  Networks
Fast Neural Architecture Search for Lightweight Dense Prediction Networks
Lam Huynh
Esa Rahtu
Juan E. Sala Matas
J. Heikkilä
184
2
0
03 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
  for Image Recognition without Convolutions
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without ConvolutionsIEEE International Conference on Consumer Electronics (ICCE), 2022
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
128
2
0
02 Mar 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and BeyondInternational Journal of Computer Vision (IJCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
248
271
0
21 Feb 2022
Context Autoencoder for Self-Supervised Representation Learning
Context Autoencoder for Self-Supervised Representation LearningInternational Journal of Computer Vision (IJCV), 2022
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
406
446
0
07 Feb 2022
DKM: Dense Kernelized Feature Matching for Geometry Estimation
DKM: Dense Kernelized Feature Matching for Geometry EstimationComputer Vision and Pattern Recognition (CVPR), 2022
Johan Edstedt
Ioannis Athanasiadis
Mårten Wadenbäck
Michael Felsberg
3DVMDE
356
179
0
01 Feb 2022
Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments
Vision-Based UAV Self-Positioning in Low-Altitude Urban EnvironmentsIEEE Transactions on Image Processing (IEEE TIP), 2022
Ming Dai
E. Zheng
Zhenhua Feng
Jiedong Zhuang
Wankou Yang
250
68
0
23 Jan 2022
Video Transformers: A Survey
Video Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
390
136
0
16 Jan 2022
SeMask: Semantically Masked Transformers for Semantic Segmentation
SeMask: Semantically Masked Transformers for Semantic Segmentation
Jitesh Jain
Anukriti Singh
Nikita Orlov
Zilong Huang
Jiachen Li
Steven Walton
Humphrey Shi
ViT
249
117
0
23 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
453
781
0
16 Dec 2021
CPPE-5: Medical Personal Protective Equipment Dataset
CPPE-5: Medical Personal Protective Equipment Dataset
Rishit Dagli
A. Shaikh
255
13
0
15 Dec 2021
SimMIM: A Simple Framework for Masked Image Modeling
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie
Zheng Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Jingdong Sun
Han Hu
397
1,619
0
18 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Hai-Tao Zheng
Li Tao
Dun Liang
Haitao Zheng
558
112
0
07 Nov 2021
Lightweight Monocular Depth with a Novel Neural Architecture Search
  Method
Lightweight Monocular Depth with a Novel Neural Architecture Search MethodIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Lam Huynh
Phong H. Nguyen
Jirí Matas
Esa Rahtu
J. Heikkilä
168
11
0
25 Aug 2021
VOLO: Vision Outlooker for Visual Recognition
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
359
372
0
24 Jun 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for
  Signal Recognition
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
370
12
0
05 Jun 2021
Visformer: The Vision-friendly Transformer
Visformer: The Vision-friendly TransformerIEEE International Conference on Computer Vision (ICCV), 2021
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
475
268
0
26 Apr 2021
Previous
123...171819