v1v2 (latest)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

IEEE International Conference on Computer Vision (ICCV), 2021

25 March 2021

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)Github (14835★)

Papers citing "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

50 / 8,510 papers shown

HDW-SR: High-Frequency Guided Diffusion Model based on Wavelet Decomposition for Image Super-Resolution

225

17 Nov 2025

SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design

254

17 Nov 2025

Segment Anything Across Shots: A Method and Benchmark

333

17 Nov 2025

End-to-End Multi-Person Pose Estimation with Pose-Aware Video Transformer

114

17 Nov 2025

Concept Regions Matter: Benchmarking CLIP with a New Cluster-Importance Approach

254

17 Nov 2025

Semi-Supervised Multi-Task Learning for Interpretable Quality As- sessment of Fundus ImagesBiomedical Signal Processing and Control (BSPC), 2025

Lucas Gabriel Telesco

...

María de los Angeles Cenoz

17 Nov 2025

DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation

121

17 Nov 2025

CapeNext: Rethinking and Refining Dynamic Support Information for Category-Agnostic Pose Estimation

135

17 Nov 2025

H-CNN-ViT: A Hierarchical Gated Attention Multi-Branch Model for Bladder Cancer Recurrence Prediction

162

17 Nov 2025

MRIQT: Physics-Aware Diffusion Model for Image Quality Transfer in Neonatal Ultra-Low-Field MRI

335

17 Nov 2025

Towards 3D Object-Centric Feature Learning for Semantic Scene Completion

256

17 Nov 2025

Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

16 Nov 2025

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

125

16 Nov 2025

SAGE: Saliency-Guided Contrastive Embeddings

Colton R. Crum

A. Czajka

Adam Czajka

135

16 Nov 2025

Seg-VAR: Image Segmentation with Visual Autoregressive Modeling

137

16 Nov 2025

LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks

Gennaro Vessio

FAtt

191

16 Nov 2025

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

...

243

16 Nov 2025

Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning

Ankita Raj

Chetan Arora

ObjD AAML VLM

294

16 Nov 2025

Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion

133

16 Nov 2025

MSLoRA: Multi-Scale Low-Rank Adaptation via Attention Reweighting

Xu Yang

Gady Agam

127

16 Nov 2025

DCMM-Transformer: Degree-Corrected Mixed-Membership Attention for Medical Imaging

113

15 Nov 2025

Application of Graph Based Vision Transformers Architectures for Accurate Temperature Prediction in Fiber Specklegram Sensors

Abhishek Sebastian

141

15 Nov 2025

MTMed3D: A Multi-Task Transformer-Based Model for 3D Medical Imaging

124

15 Nov 2025

AGGRNet: Selective Feature Extraction and Aggregation for Enhanced Medical Image Classification

109

15 Nov 2025

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

Wei-Cheng Lee

Francesco Orabona

123

14 Nov 2025

Spatial Reasoning in Multimodal Large Language Models: A Survey of Tasks, Benchmarks and Methods

114

14 Nov 2025

Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation

Frank Li

Theo Dapamede

Mohammadreza Chavoshi

...

109

12 Nov 2025

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance

Jeongho Min

Dongyoung Kim

J. Lee

206

12 Nov 2025

Stratified Knowledge-Density Super-Network for Scalable Vision Transformers

128

12 Nov 2025

Selective Sinkhorn Routing for Improved Sparse Mixture of Experts

452

12 Nov 2025

MPCM-Net: Multi-scale network integrates partial attention convolution with Mamba for ground-based cloud image segmentation

12 Nov 2025

CSF-Net: Context-Semantic Fusion Network for Large Mask Inpainting

Chae-Yeon Heo

Yeong-Jun Cho

122

11 Nov 2025

How Modality Shapes Perception and Reasoning: A Study of Error Propagation in ARC-AGI

Bo Wen

Chen Wang

Erhan Bilal

11 Nov 2025

Invisible Triggers, Visible Threats! Road-Style Adversarial Creation Attack for Visual 3D Detection in Autonomous Driving

294

11 Nov 2025

The Impact of Longitudinal Mammogram Alignment on Breast Cancer Risk Assessment

Suaiba Amina Salahuddin

...

Michael C. Kampffmeyer

11 Nov 2025

H-Model: Dynamic Neural Architectures for Adaptive Processing

Dmytro Hospodarchuk

11 Nov 2025

From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression

183

11 Nov 2025

Distributed Zero-Shot Learning for Visual Recognition

212

11 Nov 2025

Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks

101

11 Nov 2025

A Circular Argument : Does RoPE need to be Equivariant for Vision?

160

11 Nov 2025

Rethinking Explanation Evaluation under the Retraining Scheme

124

11 Nov 2025

Cross Modal Fine-Grained Alignment via Granularity-Aware and Region-Uncertain Modeling

166

11 Nov 2025

Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion

June Moh Goo

Zichao Zeng

Jan Boehm

10 Nov 2025

REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction

144

10 Nov 2025

Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion

455

10 Nov 2025

QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations

263

10 Nov 2025

Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection

114

10 Nov 2025

LeCoT: revisiting network architecture for two-view correspondence pruning

185

10 Nov 2025

MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos

229

10 Nov 2025

Anatomy-Aware Lymphoma Lesion Detection in Whole-Body PET/CT

Simone Bendazzoli

A. Tzortzakakis

Andreas Abrahamsson

Björn Engelbrekt Wahlin

362

10 Nov 2025