ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09883
  4. Cited By
Swin Transformer V2: Scaling Up Capacity and Resolution
v1v2 (latest)

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
    ViT
ArXiv (abs)PDFHTMLGithub (14834★)

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 931 papers shown
Title
A Simple and Generalist Approach for Panoptic Segmentation
A Simple and Generalist Approach for Panoptic Segmentation
Nedyalko Prisadnikov
Wouter Van Gansbeke
Danda Pani Paudel
Luc Van Gool
VLM
368
1
0
29 Aug 2024
A Review of Transformer-Based Models for Computer Vision Tasks:
  Capturing Global Context and Spatial Relationships
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
Gracile Astlin Pereira
Muhammad Hussain
ViT
192
31
0
27 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision ModelsEuropean Conference on Computer Vision (ECCV), 2024
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Forrest Iandola
VLM
394
162
0
22 Aug 2024
HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image
  Segmentation
HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation
Mingya Zhang
Zhihao Chen
Yiyuan Ge
Xianping Tao
Mamba
216
8
0
21 Aug 2024
MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial
  Purification
MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification
Huafeng Qin
Yuming Fu
Huiyan Zhang
M. El-Yacoubi
Xinbo Gao
Qun Song
Jun Wang
GANAAML
214
0
0
20 Aug 2024
Flatten: Video Action Recognition is an Image Classification task
Flatten: Video Action Recognition is an Image Classification task
Junlin Chen
Chengcheng Xu
Yangfan Xu
Zhiqiang Wang
Jun Yu Li
Zhiping Shi
216
2
0
17 Aug 2024
Focus on Focus: Focus-oriented Representation Learning and Multi-view
  Cross-modal Alignment for Glioma Grading
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma GradingIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Li Pan
Yupei Zhang
Qiushi Yang
Tan Li
Xiaohan Xing
Maximus C. F. Yeung
Zhen Chen
164
3
0
16 Aug 2024
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual
  Recognition Tasks
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition TasksComputer Vision and Pattern Recognition (CVPR), 2024
Dongshuo Yin
Leiyi Hu
Bin Li
Youqun Zhang
Xue Yang
344
31
0
15 Aug 2024
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu
Songhua Liu
Zigeng Chen
Jingwen Ye
Xinchao Wang
DD
282
3
0
15 Aug 2024
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image
  Super-Resolution
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-ResolutionACM Multimedia (MM), 2024
Yuzhen Li
Zehang Deng
Yuxin Cao
Lihua Liu
90
5
0
14 Aug 2024
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito
  Classification: A Novel Approach to Entomological Studies
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito Classification: A Novel Approach to Entomological Studies
Ahmed Akib Jawad Karim
Muhammad Zawad Mahmud
Riasat Khan
95
3
0
12 Aug 2024
MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model
MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model
Haoyu Qin
Yungang Chen
Qianchuan Jiang
Pengchao Sun
Xiancai Ye
Chao Lin
MambaAI4CE
280
3
0
12 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in
  Underperformed Scenes
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
159
5
0
12 Aug 2024
Enhancing 3D Transformer Segmentation Model for Medical Image with
  Token-level Representation Learning
Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation LearningIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Xinrong Hu
Dewen Zeng
Yawen Wu
Xueyang Li
Yiyu Shi
ViTMedIm
124
0
0
12 Aug 2024
Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images
Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images
Shouyue Liu
Jinkui Hao
Yuanyuan Gu
Huazhu Fu
Xinyu Guo
Shuting Zhang
Yitian Zhao
Hong Song
Shuting Zhang
Yitian Zhao
140
0
0
09 Aug 2024
Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach
Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach
Alireza Saber
Pouria Parhami
Alimihammad Siahkarzadeh
Amirreza Fateh
Amirreza Fateh
MedImViT
346
12
0
08 Aug 2024
What Happens Without Background? Constructing Foreground-Only Data for
  Fine-Grained Tasks
What Happens Without Background? Constructing Foreground-Only Data for Fine-Grained Tasks
Yuetian Wang
W. Hou
Qinmu Peng
Xinge You
221
0
0
04 Aug 2024
LAM3D: Leveraging Attention for Monocular 3D Object Detection
LAM3D: Leveraging Attention for Monocular 3D Object DetectionIEEE International Workshop on Multimedia Signal Processing (MMSP), 2024
Diana-Alexandra Sas
Leandro Di Bella
Yangxintong Lyu
F. Oniga
Adrian Munteanu
137
2
0
03 Aug 2024
NVC-1B: A Large Neural Video Coding Model
NVC-1B: A Large Neural Video Coding Model
Xihua Sheng
Chuanbo Tang
Li Li
Dong Liu
Feng Wu
3DVVLM
157
5
0
28 Jul 2024
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Zhijian Liu
Zhuoyang Zhang
Samir Khaki
Shang Yang
Haotian Tang
Chenfeng Xu
Kurt Keutzer
Song Han
SSeg
278
2
0
26 Jul 2024
VSSD: Vision Mamba with Non-Causal State Space Duality
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi
Minjing Dong
Mingjia Li
Chang Xu
Mamba
292
3
0
26 Jul 2024
HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from
  Focus and Single-Image Priors
HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors
Ashkan Ganj
Hang Su
Tian Guo
MDE
136
0
0
26 Jul 2024
Towards the Spectral bias Alleviation by Normalizations in Coordinate
  Networks
Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks
Zhicheng Cai
Hao Zhu
Qiu Shen
Xinran Wang
Xun Cao
297
6
0
25 Jul 2024
Embedding-Free Transformer with Inference Spatial Reduction for
  Efficient Semantic Segmentation
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
184
11
0
24 Jul 2024
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting
  Recognition
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition
Gagan Bhatia
El Moatez Billah Nagoudi
Fakhraddin Alwajih
Muhammad Abdul-Mageed
143
10
0
18 Jul 2024
UCIP: A Universal Framework for Compressed Image Super-Resolution using
  Dynamic Prompt
UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
Xin Li
Bingchen Li
Yeying Jin
Cuiling Lan
Hanxin Zhu
Yulin Ren
Zhibo Chen
227
12
0
18 Jul 2024
GroupMamba: Efficient Group-Based Visual State Space Model
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman M. Shaker
Syed Talal Wasim
Salman Khan
Juergen Gall
Fahad Shahbaz Khan
Mamba
178
4
0
18 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
310
6
0
17 Jul 2024
MapDistill: Boosting Efficient Camera-based HD Map Construction via
  Camera-LiDAR Fusion Model Distillation
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
Xiaoshuai Hao
Ruikai Li
Hui Zhang
Dingzhe Li
Rong Yin
Sangil Jung
Seungsang Park
ByungIn Yoo
Haimei Zhao
Jing Zhang
228
28
0
16 Jul 2024
Centering the Value of Every Modality: Towards Efficient and Resilient
  Modality-agnostic Semantic Segmentation
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xueye Zheng
Yuanhuiyi Lyu
Jiazhou Zhou
Lin Wang
301
19
0
16 Jul 2024
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Hao Ding
Tuxun Lu
Yuqian Zhang
Ruixing Liang
Hongchao Shu
...
Bo Wang
Marcos Fernández-Rodríguez
Estevao Lima
João L. Vilaça
Mathias Unberath
520
7
0
16 Jul 2024
Backdoor Attacks against Image-to-Image Networks
Backdoor Attacks against Image-to-Image Networks
Wenbo Jiang
Hongwei Li
Jiaming He
Rui Zhang
Guowen Xu
Tianwei Zhang
Rongxing Lu
AAML
167
8
0
15 Jul 2024
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
Haoye Dong
Aviral Chharia
Wenbo Gou
Francisco Vicente Carrasco
Fernando de la Torre
Mamba
467
34
0
12 Jul 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image
  Classification
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng
Kaipeng Zhang
Yue Yang
Hao Zhang
Ping Luo
VLM
158
3
0
11 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
261
4
0
10 Jul 2024
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
Omar S. El-Assiouti
Ghada Hamed
Dina Khattab
H. M. Ebied
245
15
0
10 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
381
216
0
10 Jul 2024
CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion
CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion
Hosam S. El-Assiouti
Hadeer El-Saadawy
M. Al-Berry
M. Tolba
ViT
200
0
0
09 Jul 2024
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang
Yulun Zhang
Fisher Yu
230
48
0
08 Jul 2024
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han
Qifan Wang
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Yi Fang
Qiang Guan
Lifu Huang
Dongfang Liu
VLM
183
12
0
05 Jul 2024
Semantically Guided Representation Learning For Action Anticipation
Semantically Guided Representation Learning For Action Anticipation
Anxhelo Diko
D. Avola
Bardh Prenkaj
Federico Fontana
Luigi Cinque
AI4TS
180
6
0
02 Jul 2024
Vision Mamba-based autonomous crack segmentation on concrete, asphalt,
  and masonry surfaces
Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces
Zhaohui Chen
Elyas Asadi Shamsabadi
Sheng Jiang
Luming Shen
Daniel Dias-da-Costa
Mamba
139
6
0
24 Jun 2024
Rethinking Remote Sensing Change Detection With A Mask View
Rethinking Remote Sensing Change Detection With A Mask View
Xiaowen Ma
Zhenkai Wu
Rongrong Lian
Wei Zhang
Siyang Song
182
7
0
21 Jun 2024
HumorDB: Can AI understand graphical humor?
HumorDB: Can AI understand graphical humor?
Veedant Jain
Gabriel Kreiman
Felipe dos Santos Alves Feitosa
VLM
302
2
0
19 Jun 2024
ChangeViT: Unleashing Plain Vision Transformers for Change Detection
ChangeViT: Unleashing Plain Vision Transformers for Change Detection
Duowang Zhu
Xiaohu Huang
Haiyan Huang
Zhenfeng Shao
Q. Cheng
281
20
0
18 Jun 2024
Demonstrating Agile Flight from Pixels without State Estimation
Demonstrating Agile Flight from Pixels without State Estimation
Ismail Geles
L. Bauersfeld
Angel Romero
Jiaxu Xing
Davide Scaramuzza
239
35
0
18 Jun 2024
Is Your HD Map Constructor Reliable under Sensor Corruptions?
Is Your HD Map Constructor Reliable under Sensor Corruptions?
Xiaoshuai Hao
Mengchuan Wei
Yifan Yang
Haimei Zhao
Hui Zhang
Yi Zhou
Qiang Wang
Weiming Li
Lingdong Kong
Jing Zhang
3DV
223
31
0
18 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Di Lin
Dacheng Tao
Liangpei Zhang
349
86
0
17 Jun 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic
  Segmentation with Plain Vision Transformers
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2024
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViTFedML
191
22
0
14 Jun 2024
LieRE: Lie Rotational Positional Encodings
LieRE: Lie Rotational Positional Encodings
Sophie Ostmeier
Brian Axelrod
Michael E. Moseley
Akshay S. Chaudhari
Akshay Chaudhari
C. Langlotz
312
1
0
14 Jun 2024
Previous
123...567...171819
Next