v1v2 (latest)

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021

ArXiv (abs)PDF HTML Github (14834★)

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 932 papers shown

MamT

^4

: Multi-view Attention Networks for Mammography Cancer ClassificationAnnual International Computer Software and Applications Conference (COMPSAC), 2024

153

03 Nov 2024

IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision

Maxwell Meyer

Jack Spruyt

ViT

122

31 Oct 2024

DiffPAD: Denoising Diffusion-based Adversarial Patch DecontaminationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

299

31 Oct 2024

Context-Aware Token Selection and Packing for Enhanced Vision Transformer

Tianyi Zhang

B. Li

Jae-sun Seo

Yu Cao

177

31 Oct 2024

Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image DatasetsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

Adrian Iordache

B. Alexe

Radu Tudor Ionescu

306

29 Oct 2024

SAM-Swin: SAM-Driven Dual-Swin Transformers with Adaptive Lesion Enhancement for Laryngo-Pharyngeal Tumor Detection

Yun Li

157

29 Oct 2024

Enhancing Community Vision Screening -- AI Driven Retinal Photography for Early Disease Detection and Patient Trust

100

27 Oct 2024

PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding

Wang-Wang Yu

191

24 Oct 2024

FIPER: Factorized Features for Robust Image Super-Resolution and Compression

604

23 Oct 2024

LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices

325

22 Oct 2024

Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost

551

22 Oct 2024

Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?Neural Information Processing Systems (NeurIPS), 2024

Lingao Xiao

Yang He

351

21 Oct 2024

D-SarcNet: A Dual-stream Deep Learning Framework for Automatic Analysis of Sarcomere Structures in Fluorescently Labeled hiPSC-CMsIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024

19 Oct 2024

Towards Zero-Shot Camera Trap Image Categorization

Jiří Vyskočil

Lukas Picek

VLM

141

16 Oct 2024

Transformer based super-resolution downscaling for regional reanalysis: Full domain vs tiling approaches

Antonio Pérez

Mario Santa Cruz

Daniel San Martín

José Manuel Gutiérrez

106

16 Oct 2024

DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing AttentionAsian Conference on Computer Vision (ACCV), 2024

221

11 Oct 2024

HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar ProjectionsIEEE Access (IEEE Access), 2024

278

11 Oct 2024

Hespi: A pipeline for automatically detecting information from hebarium specimen sheets

119

11 Oct 2024

When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning

223

11 Oct 2024

IceDiff: High Resolution and High-Quality Sea Ice Forecasting with Generative Diffusion Prior

167

10 Oct 2024

Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV ImageryFrontiers in Plant Science (Front. Plant Sci.), 2024

169

09 Oct 2024

GLRT-Based Metric Learning for Remote Sensing Object Retrieval

Linping Zhang

Yu Liu

Xueqian Wang

Gang Li

You He

213

08 Oct 2024

Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading

121

08 Oct 2024

Rank Matters: Understanding and Defending Model Inversion Attacks via Low-Rank Feature Filtering

272

08 Oct 2024

MetaDD: Boosting Dataset Distillation with Neural Network Architecture-Invariant Generalization

Xiu Su

246

07 Oct 2024

Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-TimeEuropean Conference on Computer Vision (ECCV), 2024

Chiao-An Yang

Ziwei Liu

Raymond A. Yeh

159

01 Oct 2024

CBAM-SwinT-BL: Small Rail Surface Defect Detection Method Based on Swin Transformer with Block Level CBAM EnhancementIEEE Access (IEEE Access), 2024

229

30 Sep 2024

Universal Medical Image Representation Learning with Compositional Decoders

Kaini Wang

Ling Yang

Siping Zhou

Guangquan Zhou

Wentao Zhang

Bin Cui

Shuo Li

SSL MedIm

303

30 Sep 2024

All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path AggregationNeural Information Processing Systems (NeurIPS), 2024

Xu Zhang

Peiyao Guo

Ming Lu

Zhan Ma

282

29 Sep 2024

Exploring Token Pruning in Vision State Space ModelsNeural Information Processing Systems (NeurIPS), 2024

...

Yanzhi Wang

387

27 Sep 2024

Cottention: Linear Transformers With Cosine Attention

Gabriel Mongaras

Trevor Dohm

Eric C. Larson

167

27 Sep 2024

HR-Extreme: A High-Resolution Dataset for Extreme Weather ForecastingInternational Conference on Learning Representations (ICLR), 2024

417

27 Sep 2024

MALPOLON: A Framework for Deep Species Distribution Modeling

133

26 Sep 2024

HydraViT: Stacking Heads for a Scalable ViTNeural Information Processing Systems (NeurIPS), 2024

Janek Haberer

A. Hojjat

Olaf Landsiedel

215

26 Sep 2024

TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign RecognitionIEEE International Conference on Robotics and Automation (ICRA), 2024

Yuxuan Liu

Ming Liu

Jun Ma

VLM CLIP

987

23 Sep 2024

Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake DetectionEuropean Conference on Computer Vision (ECCV), 2024

Bin Li

196

22 Sep 2024

Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification

Theodore Leng

Hamed Tabkhi

121

17 Sep 2024

InfoDisent: Explainability of Image Classification Models by Information Disentanglement

Łukasz Struski

Dawid Rymarczyk

Jacek Tabor

365

16 Sep 2024

GRIN: Zero-Shot Metric Depth with Pixel-Level DiffusionInternational Conference on 3D Vision (3DV), 2024

Vitor Campagnolo Guizilini

280

15 Sep 2024

LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation

Qiyuan Wang

Shang Zhao

Zikang Xu

S Kevin Zhou

411

14 Sep 2024

PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion PreimageAsian Conference on Computer Vision (ACCV), 2024

Denis Zavadski

Damjan Kalšan

Carsten Rother

DiffM MDE

293

13 Sep 2024

Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization

570

12 Sep 2024

Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU

Zhenyu Ning

Jieru Zhao

Qihao Jin

Wenchao Ding

Minyi Guo

11 Sep 2024

EDADepth: Enhanced Data Augmentation for Monocular Depth EstimationInternational Conference on Machine Learning and Applications (ICMLA), 2024

Nischal Khanal

Shivanand Venkanna Sheshappanavar

MDE

334

10 Sep 2024

Renormalized Connection for Scale-preferred Object Detection in Satellite ImageryIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024

Fan Zhang

Lingling Li

Licheng Jiao

Xu Liu

Fang Liu

Shuyuan Yang

B. Hou

ObjD

264

09 Sep 2024

UNIT: Unifying Image and Text Recognition in One Vision EncoderNeural Information Processing Systems (NeurIPS), 2024

Yi Zhu

Jianhua Han

279

06 Sep 2024

SDformerFlow: Spatiotemporal swin spikeformer for event-based optical flow estimation

Yi Tian

Juan Andrade-Cetto

198

06 Sep 2024

iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation

Hayeon Jo

Hyesong Choi

Minhee Cho

Dongbo Min

340

04 Sep 2024

Cross-domain Multi-step Thinking: Zero-shot Fine-grained Traffic Sign Recognition in the WildKnowledge-Based Systems (KBS), 2024

331

03 Sep 2024

SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

221

02 Sep 2024