v1v2 (latest)

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021

ArXiv (abs)PDF HTML Github (14834★)

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 933 papers shown

ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real ImagesEuropean Conference on Computer Vision (ECCV), 2024

158

15 Mar 2024

Rethinking Referring Object Removal

203

14 Mar 2024

DiTMoS: Delving into Diverse Tiny-Model Selection on MicrocontrollersAnnual IEEE International Conference on Pervasive Computing and Communications (PerCom), 2024

184

14 Mar 2024

MonoOcc: Digging into Monocular Semantic Occupancy PredictionIEEE International Conference on Robotics and Automation (ICRA), 2024

Xiang Li

Bu Jin

Hao Zhao

220

13 Mar 2024

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image CompressionAAAI Conference on Artificial Intelligence (AAAI), 2024

Jiawei Shao

411

13 Mar 2024

Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition

256

11 Mar 2024

DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos

Ying Shan

Xiaojuan Qi

MDE

247

09 Mar 2024

Probabilistic Image-Driven Traffic Modeling via Remote SensingEuropean Conference on Computer Vision (ECCV), 2024

Scott Workman

Armin Hadzic

190

08 Mar 2024

SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization

Rongfeng Lu

Chenggang Yan

237

07 Mar 2024

xT: Nested Tokenization for Larger Context in Large Images

240

04 Mar 2024

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Hanxin Zhu

219

29 Feb 2024

CAMixerSR: Only Details Need More "Attention"

Li Zhang

256

29 Feb 2024

Effective Message Hiding with Order-Preserving Mechanisms

Yu Gao

Xuchong Qiu

Zihan Ye

340

29 Feb 2024

Mixer is more than just a model

Qingfeng Ji

Yuxin Wang

Letong Sun

178

28 Feb 2024

State Space Models for Event Cameras

Nikola Zubić

Mathias Gehrig

Davide Scaramuzza

498

23 Feb 2024

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang

I-Hau Yeh

Hongpeng Liao

426

2,984

21 Feb 2024

TransGOP: Transformer-Based Gaze Object Prediction

232

21 Feb 2024

LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks

Truong Thanh Hung Nguyen

Tobias Clement

Phuc Truong Loc Nguyen

268

19 Feb 2024

Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels

348

19 Feb 2024

AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer

Tanmoy Dam

Sanjay Bhargav Dharavath

235

12 Feb 2024

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

338

245

07 Feb 2024

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

314

07 Feb 2024

Neural Networks Learn Statistics of Increasing Complexity

239

06 Feb 2024

SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images

146

06 Feb 2024

CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse

Cunhan Guo

Heyan Huang

216

03 Feb 2024

Bass Accompaniment Generation via Latent Diffusion

Marco Pasini

M. Grachten

Stefan Lattner

206

02 Feb 2024

A Manifold Representation of the Key in Vision Transformers

355

01 Feb 2024

SimAda: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes

259

31 Jan 2024

Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels

228

30 Jan 2024

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Bin Lin

...

442

270

29 Jan 2024

VJT: A Video Transformer on Joint Tasks of Deblurring, Low-light Enhancement and Denoising

Jinshan Pan

245

26 Jan 2024

CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation ProcessInternational Conference on Machine Learning (ICML), 2024

Xiangchen Song

271

25 Jan 2024

An open dataset for the evolution of oracle bone characters: EVOBC

Yuliang Liu

Lianwen Jin

301

23 Jan 2024

AdaEmbed: Semi-supervised Domain Adaptation in the Embedding Space

A. Mottaghi

Mohammad Abdullah Jamal

Serena Yeung

Omid Mohareri

176

23 Jan 2024

OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease Detection

188

22 Jan 2024

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion TransformersInternational Conference on Machine Learning (ICML), 2024

Katherine Crowson

Stefan Andreas Baumann

Alex Birch

Tanishq Mathew Abraham

Daniel Z. Kaplan

Enrico Shippole

338

21 Jan 2024

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

672

1,440

19 Jan 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

Yang You

293

19 Jan 2024

Deep spatial context: when attention-based models meet spatial regression

Paulina Tomaszewska

El.zbieta Sienkiewicz

Mai P. Hoang

Przemysław Biecek

207

18 Jan 2024

Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine StrategyData Compression Conference (DCC), 2024

Xin Li

246

16 Jan 2024

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token DictionaryComputer Vision and Pattern Recognition (CVPR), 2024

281

16 Jan 2024

Discriminative Consensus Mining with A Thousand Groups for More Accurate Co-Salient Object Detection

Peng Zheng

301

15 Jan 2024

MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction

Toyota Li

225

14 Jan 2024

Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

Siddharth Tiwari

MedIm ViT

169

10 Jan 2024

Revisiting Adversarial Training at ScaleComputer Vision and Pattern Recognition (CVPR), 2024

Zeyu Wang

Xianhang Li

Hongru Zhu

Cihang Xie

430

09 Jan 2024

GTA: Guided Transfer of Spatial Attention from Object-Centric Representations

182

05 Jan 2024

AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification

326

05 Jan 2024

Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

194

05 Jan 2024

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything ModelComputer Vision and Pattern Recognition (CVPR), 2024

530

04 Jan 2024

Hybrid Pooling and Convolutional Network for Improving Accuracy and Training Convergence Speed in Object Detection

282

02 Jan 2024