v1v2 (latest)

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021

ArXiv (abs)PDF HTML Github (14834★)

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 932 papers shown

Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories

179

04 Sep 2025

TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers

129

03 Sep 2025

Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025

Ranjan Sapkota

Manoj Karkee

ObjD VLM

290

25 Aug 2025

Expandable Residual Approximation for Knowledge DistillationIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025

117

22 Aug 2025

Vision encoders should be image size agnostic and task driven

22 Aug 2025

Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset

104

21 Aug 2025

Scalable Event-Based Video Streaming for Machines with MoQMile-High Video Conference (MHV), 2025

Andrew C. Freeman

112

20 Aug 2025

On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines

319

20 Aug 2025

MedFormer: a data-driven model for forecasting the Mediterranean Sea

...

132

16 Aug 2025

Privacy-enhancing Sclera Segmentation Benchmarking Competition: SSBC 2025

...

Raghavendra Ramachandra

148

14 Aug 2025

UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale

Yuhao Wang

Wei Xi

211

12 Aug 2025

Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment

196

12 Aug 2025

CoCAViT: Compact Vision Transformer with Robust Global Coordination

112

07 Aug 2025

Prototype-Driven Structure Synergy Network for Remote Sensing Images SegmentationIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2025

139

06 Aug 2025

SolarSeer: Ultrafast and accurate 24-hour solar irradiance forecasts outperforming numerical weather prediction across the USA

...

05 Aug 2025

TopoImages: Incorporating Local Topology Encoding into Deep Learning Models for Medical Image Classification

100

03 Aug 2025

Evading Data Provenance in Deep Neural Networks

251

01 Aug 2025

Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations

172

29 Jul 2025

Can Foundation Models Predict Fitness for Duty?

Juan E. Tapia

Christoph Busch

27 Jul 2025

VAMPIRE: Uncovering Vessel Directional and Morphological Information from OCTA Images for Cardiovascular Disease Risk Factor PredictionInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

147

26 Jul 2025

Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows

Simin Huo

Ning Li

ViT

240

24 Jul 2025

CLARIFID: Improving Radiology Report Generation by Reinforcing Clinically Accurate Impressions and Enforcing Detailed Findings

319

23 Jul 2025

DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD

137

23 Jul 2025

IONext: Unlocking the Next Era of Inertial Odometry

158

23 Jul 2025

Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit

Junqi Yin

Mijanur Palash

M. Paul Laiu

Muralikrishnan Gopalakrishnan Meena

113

22 Jul 2025

A High Magnifications Histopathology Image Dataset for Oral Squamous Cell Carcinoma Diagnosis and Prognosis

167

22 Jul 2025

DeSamba: Decoupled Spectral Adaptive Framework for 3D Multi-Sequence MRI Lesion Classification

230

21 Jul 2025

MedSR-Impact: Transformer-Based Super-Resolution for Lung CT Segmentation, Radiomics, Classification, and Prognosis

144

21 Jul 2025

Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking

196

15 Jul 2025

ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference

221

14 Jul 2025

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

164

25 Jun 2025

AeroGPT: Leveraging Large-Scale Audio Model for Aero-Engine Bearing Fault Diagnosis

163

19 Jun 2025

A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects

Guohuan Xie

Syed Ariff Syed Hesham

166

16 Jun 2025

LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning

210

14 Jun 2025

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific ApproachesInformation Fusion (Inf. Fusion), 2025

251

12 Jun 2025

DeepTraverse: A Depth-First Search Inspired Network for Algorithmic Visual Understanding

Bin Guo

John H.L. Hansen

222

11 Jun 2025

SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields

207

11 Jun 2025

Canonical Latent Representations in Conditional Diffusion Models

250

11 Jun 2025

MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

188

09 Jun 2025

Can Foundation Models Generalise the Presentation Attack Detection Capabilities on ID Cards?

Juan E. Tapia

Christoph Busch

217

05 Jun 2025

Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter EraAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Dan Oneaţă

Desmond Elliott

Stella Frank

187

04 Jun 2025

FuXi-Ocean: A Global Ocean Forecasting System with Sub-Daily Resolution

216

03 Jun 2025

RoadFormer : Local-Global Feature Fusion for Road Surface Classification in Autonomous Driving

Tianze Wang

Zhang Zhang

Chao Sun

191

03 Jun 2025

Learning Sparsity for Effective and Efficient Music Performance Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

238

02 Jun 2025

PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations

248

30 May 2025

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization

539

29 May 2025

FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion models

296

27 May 2025

The Missing Point in Vision Transformers for Universal Image Segmentation

Konstantinos N. Plataniotis

Arash Mohammadi

ViT ISeg

308

26 May 2025

Towards Fully FP8 GEMM LLM Training at Scale

Alejandro Hernández Cano

Dhia Garbaya

Imanol Schlag

Martin Jaggi

354

26 May 2025

Asymmetric Duos: Sidekicks Improve Uncertainty

461

24 May 2025