Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2111.09883
Cited By
v1
v2 (latest)
Swin Transformer V2: Scaling Up Capacity and Resolution
18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (14834★)
Papers citing
"Swin Transformer V2: Scaling Up Capacity and Resolution"
50 / 933 papers shown
ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images
European Conference on Computer Vision (ECCV), 2024
Xiangtian Xue
Jiasong Wu
Youyong Kong
L. Senhadji
Huazhong Shu
DiffM
158
1
0
15 Mar 2024
Rethinking Referring Object Removal
Xiangtian Xue
Jiasong Wu
Youyong Kong
L. Senhadji
Huazhong Shu
DiffM
203
0
0
14 Mar 2024
DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers
Annual IEEE International Conference on Pervasive Computing and Communications (PerCom), 2024
Xiao Ma
Shengfeng He
Hezhe Qiao
Dong-Lai Ma
184
3
0
14 Mar 2024
MonoOcc: Digging into Monocular Semantic Occupancy Prediction
IEEE International Conference on Robotics and Automation (ICRA), 2024
Yupeng Zheng
Xiang Li
Pengfei Li
Yuhang Zheng
Bu Jin
Chengliang Zhong
Xiaoxiao Long
Hao Zhao
Qichao Zhang
220
44
0
13 Mar 2024
CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression
AAAI Conference on Artificial Intelligence (AAAI), 2024
Xinjie Zhang
Shenyuan Gao
Zhening Liu
Jiawei Shao
Xingtong Ge
Dailan He
Tongda Xu
Yan Wang
Jun Zhang
411
4
0
13 Mar 2024
Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition
Erkut Akdag
Zeqi Zhu
Egor Bondarev
Peter H. N. de With
ViT
256
8
0
11 Mar 2024
DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos
Xiuzhe Wu
Xiaoyang Lyu
Qihao Huang
Yong-Jin Liu
Yang Wu
Ying Shan
Xiaojuan Qi
MDE
247
0
0
09 Mar 2024
Probabilistic Image-Driven Traffic Modeling via Remote Sensing
European Conference on Computer Vision (ECCV), 2024
Scott Workman
Armin Hadzic
190
0
0
08 Mar 2024
SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization
Quan Chen
Tingyu Wang
Zihao Yang
Haoran Li
Rongfeng Lu
Yaoqi Sun
Bolun Zheng
Chenggang Yan
237
39
0
07 Mar 2024
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta
Shufan Li
Tyler Lixuan Zhu
Jitendra Malik
Trevor Darrell
K. Mangalam
ViT
240
8
0
04 Mar 2024
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li
Xin Li
Hanxin Zhu
Yeying Jin
Ruoyu Feng
Zhizheng Zhang
Zhibo Chen
SupR
219
42
0
29 Feb 2024
CAMixerSR: Only Details Need More "Attention"
Yan Wang
Yi Liu
Shijie Zhao
Junlin Li
Li Zhang
SupR
256
53
0
29 Feb 2024
Effective Message Hiding with Order-Preserving Mechanisms
Yu Gao
Xuchong Qiu
Zihan Ye
340
4
0
29 Feb 2024
Mixer is more than just a model
Qingfeng Ji
Yuxin Wang
Letong Sun
178
0
0
28 Feb 2024
State Space Models for Event Cameras
Nikola Zubić
Mathias Gehrig
Davide Scaramuzza
498
78
0
23 Feb 2024
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang
I-Hau Yeh
Hongpeng Liao
426
2,984
0
21 Feb 2024
TransGOP: Transformer-Based Gaze Object Prediction
Binglu Wang
Chenxi Guo
Yang Jin
Haisheng Xia
Nian Liu
232
6
0
21 Feb 2024
LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks
Truong Thanh Hung Nguyen
Tobias Clement
Phuc Truong Loc Nguyen
Nils Kemmerzell
Van Binh Truong
V. Nguyen
Mohamed Abdelaal
Hung Cao
VLM
268
16
0
19 Feb 2024
Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels
Shubhi Shukla
Manaar Alam
Pabitra Mitra
Debdeep Mukhopadhyay
MLAU
AAML
348
2
0
19 Feb 2024
AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer
Tanmoy Dam
Sanjay Bhargav Dharavath
Sameer Alam
Nimrod Lilith
Supriyo Chakraborty
Mir Feroskhan
235
4
0
12 Feb 2024
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
Ziyang Wang
Jian-Qing Zheng
Yichi Zhang
Ge Cui
Lei Li
Mamba
338
245
0
07 Feb 2024
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
Abhimanyu Bambhaniya
Amir Yazdanbakhsh
Suvinay Subramanian
Sheng-Chun Kao
Shivani Agrawal
Utku Evci
Tushar Krishna
314
24
0
07 Feb 2024
Neural Networks Learn Statistics of Increasing Complexity
Nora Belrose
Quintin Pope
Lucia Quirke
Alex Troy Mallen
Xiaoli Z. Fern
239
19
0
06 Feb 2024
SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images
Pengming Feng
Mingjie Xie
Hongning Liu
Xuanjia Zhao
Guangjun He
Xueliang Zhang
Jian Guan
146
2
0
06 Feb 2024
CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse
Cunhan Guo
Heyan Huang
216
5
0
03 Feb 2024
Bass Accompaniment Generation via Latent Diffusion
Marco Pasini
M. Grachten
Stefan Lattner
206
19
0
02 Feb 2024
A Manifold Representation of the Key in Vision Transformers
Li Meng
Morten Goodwin
Anis Yazidi
P. Engelstad
355
1
0
01 Feb 2024
SimAda: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes
Yiran Song
Qianyu Zhou
Xuequan Lu
Zhiwen Shao
Lizhuang Ma
259
7
0
31 Jan 2024
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels
Chak Fong Chong
Xinyi Fang
Jielong Guo
Yapeng Wang
Wei Ke
C. Lam
Sio-Kei Im
228
3
0
30 Jan 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLM
MLLM
MoE
442
270
0
29 Jan 2024
VJT: A Video Transformer on Joint Tasks of Deblurring, Low-light Enhancement and Denoising
Yuxiang Hui
Yang Liu
Yaofang Liu
Fan Jia
Jinshan Pan
Raymond H. F. Chan
Tieyong Zeng
ViT
245
3
0
26 Jan 2024
CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process
International Conference on Machine Learning (ICML), 2024
Guan-Hong Chen
Yifan Shen
Zhenhao Chen
Xiangchen Song
Yuewen Sun
Weiran Yao
Xiao Liu
Kun Zhang
CML
271
15
0
25 Jan 2024
An open dataset for the evolution of oracle bone characters: EVOBC
Haisu Guan
Jinpeng Wan
Yuliang Liu
Pengjie Wang
Kaile Zhang
Zhebin Kuang
Xinyu Wang
Xiang Bai
Lianwen Jin
301
11
0
23 Jan 2024
AdaEmbed: Semi-supervised Domain Adaptation in the Embedding Space
A. Mottaghi
Mohammad Abdullah Jamal
Serena Yeung
Omid Mohareri
176
2
0
23 Jan 2024
OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease Detection
Fatema Jannat
Sina Gholami
Minha Alam
Hamed Tabkhi
188
3
0
22 Jan 2024
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
International Conference on Machine Learning (ICML), 2024
Katherine Crowson
Stefan Andreas Baumann
Alex Birch
Tanishq Mathew Abraham
Daniel Z. Kaplan
Enrico Shippole
338
80
0
21 Jan 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaohan Li
Jiashi Feng
Hengshuang Zhao
VLM
672
1,440
0
19 Jan 2024
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao
Shenggan Cheng
Guangyang Lu
Jiarui Fang
Hao Zhou
Bin Jia
Ziming Liu
Yang You
MQ
293
4
0
19 Jan 2024
Deep spatial context: when attention-based models meet spatial regression
Paulina Tomaszewska
El.zbieta Sienkiewicz
Mai P. Hoang
Przemysław Biecek
207
1
0
18 Jan 2024
Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy
Data Compression Conference (DCC), 2024
Zihao Yu
Fengbin Guan
Yiting Lu
Xin Li
Zhibo Chen
ViT
246
7
0
16 Jan 2024
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
Computer Vision and Pattern Recognition (CVPR), 2024
Leheng Zhang
Yawei Li
Xingyu Zhou
Xiaorui Zhao
Shuhang Gu
SupR
281
76
0
16 Jan 2024
Discriminative Consensus Mining with A Thousand Groups for More Accurate Co-Salient Object Detection
Peng Zheng
301
0
0
15 Jan 2024
MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction
Toyota Li
225
8
0
14 Jan 2024
Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation
Siddharth Tiwari
MedIm
ViT
169
2
0
10 Jan 2024
Revisiting Adversarial Training at Scale
Computer Vision and Pattern Recognition (CVPR), 2024
Zeyu Wang
Xianhang Li
Hongru Zhu
Cihang Xie
430
33
0
09 Jan 2024
GTA: Guided Transfer of Spatial Attention from Object-Centric Representations
SeokHyun Seo
Jinwoo Hong
Jungwoo Chae
Kyungyul Kim
Sangheum Hwang
182
0
0
05 Jan 2024
AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification
Huy Nguyen
Kien Nguyen
Sridha Sridharan
Clinton Fookes
326
32
0
05 Jan 2024
Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment
Yongxu Liu
Yinghui Quan
Guoyao Xiao
Aobo Li
Jinjian Wu
194
17
0
05 Jan 2024
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
Computer Vision and Pattern Recognition (CVPR), 2024
Yiran Song
Qianyu Zhou
Hefei Ling
Deng-Ping Fan
Xuequan Lu
Lizhuang Ma
VLM
530
20
0
04 Jan 2024
Hybrid Pooling and Convolutional Network for Improving Accuracy and Training Convergence Speed in Object Detection
Shiwen Zhao
Wei Wang
Junhui Hou
Haihang Wu
ObjD
282
0
0
02 Jan 2024
Previous
1
2
3
...
8
9
10
...
17
18
19
Next
Page 9 of 19
Page
of 19
Go