v1v2 (latest)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

IEEE International Conference on Computer Vision (ICCV), 2021

25 March 2021

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)Github (14835★)

Papers citing "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

50 / 8,588 papers shown

3rd Workshop on Maritime Computer Vision (MaCVi) 2025: Challenge Results

...

204

20 Jan 2025

MRI2Speech: Speech Synthesis from Articulatory Movements Recorded by Real-time MRIIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

246

20 Jan 2025

CSHNet: A Novel Information Asymmetric Image Translation Method

113

20 Jan 2025

Elucidating the Design Space of Dataset CondensationNeural Information Processing Systems (NeurIPS), 2024

773

20 Jan 2025

Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural NetworksIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025

Michael Schwingshackl

Fabio Francisco Oberweger

Markus Murschitz

271

20 Jan 2025

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

490

17 Jan 2025

MAMo: Leveraging Memory and Attention for Monocular Video Depth EstimationIEEE International Conference on Computer Vision (ICCV), 2023

666

17 Jan 2025

FutureDepth: Learning to Predict the Future Improves Video Depth EstimationEuropean Conference on Computer Vision (ECCV), 2024

527

17 Jan 2025

A Comprehensive Survey of Foundation Models in MedicineIEEE Reviews in Biomedical Engineering (RBME), 2024

797

17 Jan 2025

Unified Face Matching and Physical-Digital Spoofing Attack Detection

Arun Kunwar

Ajita Rattani

CVBM AAML

308

17 Jan 2025

WMamba: Wavelet-based Mamba for Face Forgery Detection

392

16 Jan 2025

NeurOp-Diff:Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion

332

15 Jan 2025

Self Pre-training with Adaptive Mask Autoencoders for Variable-Contrast 3D Medical ImagingIEEE International Symposium on Biomedical Imaging (ISBI), 2025

233

15 Jan 2025

Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data EnrichingIEEE International Conference on Data Engineering (ICDE), 2025

170

14 Jan 2025

AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationIEEE transactions on multimedia (TMM), 2025

150

14 Jan 2025

Learning Motion and Temporal Cues for Unsupervised Video Object SegmentationIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024

488

14 Jan 2025

Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation

258

13 Jan 2025

Toward Realistic Camouflaged Object Detection: Benchmarks and Method

205

13 Jan 2025

EdgeTAM: On-Device Track Anything ModelComputer Vision and Pattern Recognition (CVPR), 2025

...

Raghuraman Krishnamoorthi

327

13 Jan 2025

MathReader : Text-to-Speech for Mathematical DocumentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

322

13 Jan 2025

Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities

293

13 Jan 2025

Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective

Jinjing Zhu

Songze Li

Lin Wang

332

13 Jan 2025

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

361

13 Jan 2025

CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object DetectionInformation Fusion (Inf. Fusion), 2025

278

11 Jan 2025

YO-CSA-T: A Real-time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention

Yuan Lai

Zhiwei Shi

Chengxi Zhu

11 Jan 2025

Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity

Navin Ranjan

Andreas E. Savakis

220

10 Jan 2025

HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection

300

10 Jan 2025

Hyper-3DG: Text-to-3D Gaussian Generation via HypergraphInternational Journal of Computer Vision (IJCV), 2024

357

10 Jan 2025

BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response

Hongruixuan Chen

Jian Song

Olivier Dietrich

Clifford Broni-bediako

...

956

10 Jan 2025

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

...

864

487

10 Jan 2025

MHAFF: Multi-Head Attention Feature Fusion of CNN and Transformer for Cattle IdentificationIEEE Transactions on AgriFood Electronics (TAE), 2025

121

10 Jan 2025

MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos

619

10 Jan 2025

CAMs as Shapley Value-based ExplainersThe Visual Computer (Vis. Comput.), 2025

Huaiguang Cai

FAtt

239

09 Jan 2025

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

534

08 Jan 2025

AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

176

08 Jan 2025

Learning Informative Latent Representation for Quantum State TomographyIEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), 2023

318

08 Jan 2025

Flemme: A Flexible and Modular Learning Platform for Medical ImagesIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024

289

08 Jan 2025

Clinical Insights: A Comprehensive Review of Language Models in MedicinePLOS Digital Health (PDH), 2024

597

08 Jan 2025

Siamese-DETR for Generic Multi-Object TrackingIEEE Transactions on Image Processing (IEEE TIP), 2023

317

08 Jan 2025

BEN: Using Confidence-Guided Matting for Dichotomous Image Segmentation

Maxwell Meyer

Jack Spruyt

329

08 Jan 2025

Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and InsightsNeural Information Processing Systems (NeurIPS), 2025

324

08 Jan 2025

MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attentionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Aadya Arora

Vinay Namboodiri

VLM

08 Jan 2025

NBBOX: Noisy Bounding Box Improves Remote Sensing Object DetectionIEEE Geoscience and Remote Sensing Letters (GRSL), 2024

394

08 Jan 2025

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report GenerationIEEE Transactions on Medical Imaging (IEEE TMI), 2025

219

08 Jan 2025

Tighnari: Multi-modal Plant Species Prediction Based on Hierarchical Cross-Attention Using Graph-Based and Vision Backbone-Extracted FeaturesConference and Labs of the Evaluation Forum (CLEF), 2025

07 Jan 2025

PARF-Net: integrating pixel-wise adaptive receptive fields into hybrid Transformer-CNN network for medical image segmentation

336

06 Jan 2025

Visual Large Language Models for Generalized and Specialized Applications

475

06 Jan 2025

Multilevel Semantic-Aware Model for AI-Generated Video Quality AssessmentIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

154

06 Jan 2025

MObI: Multimodal Object Inpainting Using Diffusion Models

451

06 Jan 2025

Facial Attractiveness Prediction in Live Streaming: A New Benchmark and Multi-modal Method

425

05 Jan 2025