ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09883
  4. Cited By
Swin Transformer V2: Scaling Up Capacity and Resolution
v1v2 (latest)

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
    ViT
ArXiv (abs)PDFHTMLGithub (14834★)

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 931 papers shown
Title
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual InformationComputer Vision and Pattern Recognition (CVPR), 2022
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
205
54
0
17 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at ScaleComputer Vision and Pattern Recognition (CVPR), 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMCLIP
539
888
0
14 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsComputer Vision and Pattern Recognition (CVPR), 2022
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Jiaming Song
Xiaogang Wang
Yu Qiao
VLM
491
939
0
10 Nov 2022
OneFormer: One Transformer to Rule Universal Image Segmentation
OneFormer: One Transformer to Rule Universal Image SegmentationComputer Vision and Pattern Recognition (CVPR), 2022
Jitesh Jain
Jiacheng Li
M. Chiu
Ali Hassani
Nikita Orlov
Humphrey Shi
ViT
268
466
0
10 Nov 2022
Demystify Transformers & Convolutions in Modern Image Deep Networks
Demystify Transformers & Convolutions in Modern Image Deep NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jifeng Dai
Min Shi
Weiyun Wang
Sitong Wu
Linjie Xing
...
Lewei Lu
Jie Zhou
Xiaogang Wang
Botian Shi
Xiao-hua Hu
ViT
237
11
0
10 Nov 2022
Efficient Image Generation with Variadic Attention Heads
Efficient Image Generation with Variadic Attention Heads
Steven Walton
Ali Hassani
Xingqian Xu
Zinan Lin
Humphrey Shi
ViT
248
23
0
10 Nov 2022
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Qiang Chen
Jian Wang
Chuchu Han
Shangang Zhang
Zexian Li
...
Haocheng Feng
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
ViTVLM
179
49
0
07 Nov 2022
Late Fusion with Triplet Margin Objective for Multimodal Ideology
  Prediction and Analysis
Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Changyuan Qiu
Winston Wu
Xinliang Frederick Zhang
Lu Wang
120
1
0
04 Nov 2022
Could Giant Pretrained Image Models Extract Universal Representations?
Could Giant Pretrained Image Models Extract Universal Representations?Neural Information Processing Systems (NeurIPS), 2022
Yutong Lin
Ze Liu
Zheng Zhang
Han Hu
Nanning Zheng
Stephen Lin
Yue Cao
VLM
162
9
0
03 Nov 2022
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental
  Learning
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental LearningNeural Information Processing Systems (NeurIPS), 2022
Yixuan Pei
Zhiwu Qing
Jun Cen
Xiang Wang
Shiwei Zhang
Yaxiong Wang
Mingqian Tang
Nong Sang
Xueming Qian
125
20
0
02 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of
  Application
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjDOOD
220
0
0
01 Nov 2022
Point-Syn2Real: Semi-Supervised Synthetic-to-Real Cross-Domain Learning
  for Object Classification in 3D Point Clouds
Point-Syn2Real: Semi-Supervised Synthetic-to-Real Cross-Domain Learning for Object Classification in 3D Point CloudsIEEE International Conference on Multimedia and Expo (ICME), 2022
Ziwei Wang
Reza Arablouei
Jiajun Liu
Paulo Borges
G. Bishop-Hurley
Nic Heaney
3DPC
100
3
0
31 Oct 2022
The Curious Case of Benign Memorization
The Curious Case of Benign MemorizationInternational Conference on Learning Representations (ICLR), 2022
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
307
12
0
25 Oct 2022
BARS: A Benchmark for Airport Runway Segmentation
BARS: A Benchmark for Airport Runway Segmentation
Wenhui Chen
Zhijiang Zhang
Liang Yu
Yichun Tai
320
14
0
24 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using
  Strips Window Attention
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window AttentionAAAI Conference on Artificial Intelligence (AAAI), 2022
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
406
46
0
22 Oct 2022
Accumulated Trivial Attention Matters in Vision Transformers on Small
  Datasets
Accumulated Trivial Attention Matters in Vision Transformers on Small DatasetsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Xiangyu Chen
Qinghao Hu
Kaidong Li
Cuncong Zhong
Guanghui Wang
ViT
231
20
0
22 Oct 2022
A Unified View of Masked Image Modeling
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
226
42
0
19 Oct 2022
A Tri-Layer Plugin to Improve Occluded Detection
A Tri-Layer Plugin to Improve Occluded DetectionBritish Machine Vision Conference (BMVC), 2022
Guanqi Zhan
Weidi Xie
Andrew Zisserman
169
26
0
18 Oct 2022
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for
  Text-to-Image Generation
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation
Rui Li
Weihua Li
Yi Yang
Hanyu Wei
Jianhua Jiang
Quan-wei Bai
DiffM
320
17
0
18 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But FasterInternational Conference on Learning Representations (ICLR), 2022
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
376
701
0
17 Oct 2022
2nd Place Solution to Google Universal Image Embedding
2nd Place Solution to Google Universal Image Embedding
Xiaolong Huang
Qiankun Li
SSL
220
2
0
17 Oct 2022
Probabilistic Integration of Object Level Annotations in Chest X-ray
  Classification
Probabilistic Integration of Object Level Annotations in Chest X-ray ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tom van Sonsbeek
Xiantong Zhen
Dwarikanath Mahapatra
M. Worring
200
16
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and OpportunitiesJournal of machine learning research (JMLR), 2022
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
268
63
0
13 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State
  Spaces
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
201
50
0
12 Oct 2022
How Much Data Are Augmentations Worth? An Investigation into Scaling
  Laws, Invariance, and Implicit Regularization
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit RegularizationInternational Conference on Learning Representations (ICLR), 2022
Jonas Geiping
Micah Goldblum
Gowthami Somepalli
Ravid Shwartz-Ziv
Tom Goldstein
A. Wilson
268
50
0
12 Oct 2022
Match Cutting: Finding Cuts with Smooth Visual Transitions
Match Cutting: Finding Cuts with Smooth Visual TransitionsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Boris Chen
Amir Ziai
Rebecca Tucker
Yuchen Xie
VGen
248
17
0
11 Oct 2022
Curved Representation Space of Vision Transformers
Curved Representation Space of Vision TransformersAAAI Conference on Artificial Intelligence (AAAI), 2022
Juyeop Kim
Junha Park
Songkuk Kim
Jongseok Lee
ViT
245
9
0
11 Oct 2022
Rethinking the Detection Head Configuration for Traffic Object Detection
Rethinking the Detection Head Configuration for Traffic Object Detection
Yi Shi
Jiang Wu
Shixuan Zhao
Gangyao Gao
T. Deng
Hongmei Yan
ObjD
138
6
0
08 Oct 2022
Humans need not label more humans: Occlusion Copy & Paste for Occluded
  Human Instance Segmentation
Humans need not label more humans: Occlusion Copy & Paste for Occluded Human Instance SegmentationBritish Machine Vision Conference (BMVC), 2022
Evan Ling
De-Kai Huang
Minhoe Hur
171
6
0
07 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
  Models
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision ModelsInternational Conference on Learning Representations (ICLR), 2022
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViTMoE
289
76
0
04 Oct 2022
Dual-former: Hybrid Self-attention Transformer for Efficient Image
  Restoration
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Erkang Chen
ViT
111
17
0
03 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without
  Fine-tuning
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
229
38
0
03 Oct 2022
Dilated Neighborhood Attention Transformer
Dilated Neighborhood Attention Transformer
Ali Hassani
Humphrey Shi
ViTMedIm
241
95
0
29 Sep 2022
Transfer Learning with Pretrained Remote Sensing Transformers
Transfer Learning with Pretrained Remote Sensing Transformers
A. Fuller
K. Millard
J.R. Green
204
11
0
28 Sep 2022
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and
  Restoration
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration
Marcos V. Conde
Ui-Jin Choi
Maxime Burchi
Radu Timofte
ViT
226
188
0
22 Sep 2022
VINet: Visual and Inertial-based Terrain Classification and Adaptive
  Navigation over Unknown Terrain
VINet: Visual and Inertial-based Terrain Classification and Adaptive Navigation over Unknown TerrainIEEE International Conference on Robotics and Automation (ICRA), 2022
Tianrui Guan
Ruitao Song
Zhixian Ye
Liangjun Zhang
182
16
0
16 Sep 2022
Communication-Efficient and Privacy-Preserving Feature-based Federated
  Transfer Learning
Communication-Efficient and Privacy-Preserving Feature-based Federated Transfer LearningGlobal Communications Conference (GLOBECOM), 2022
Feng Wang
M. C. Gursoy
Senem Velipasalar
223
3
0
12 Sep 2022
LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field
  Images
LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field ImagesIEEE Transactions on Image Processing (IEEE TIP), 2022
Shansi Zhang
Nan Meng
E. Lam
ViT
198
29
0
06 Sep 2022
A Review of Sparse Expert Models in Deep Learning
A Review of Sparse Expert Models in Deep Learning
W. Fedus
J. Dean
Barret Zoph
MoE
228
188
0
04 Sep 2022
AutoPET Challenge: Combining nn-Unet with Swin UNETR Augmented by
  Maximum Intensity Projection Classifier
AutoPET Challenge: Combining nn-Unet with Swin UNETR Augmented by Maximum Intensity Projection Classifier
Lars Heiliger
Zdravko Marinov
Max Hasin
André Ferreira
Jana Fragemann
...
D. Kersting
Victor Alves
Rainer Stiefelhagen
Jan Egger
Jens Kleesiek
91
11
0
02 Sep 2022
AIM 2022 Challenge on Super-Resolution of Compressed Image and Video:
  Dataset, Methods and Results
AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results
Ren Yang
Radu Timofte
Xin Li
Tao Gui
Lin Zhang
...
Yijian Zhang
Mao Ye
Dengyan Luo
Xiaofeng Pan
L. Peng
SupR
215
34
0
23 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLMVLMViT
530
704
0
22 Aug 2022
Conv-Adapter: Exploring Parameter Efficient Transfer Learning for
  ConvNets
Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets
Hao Chen
R. Tao
Han Zhang
Yidong Wang
Kaijie Zhu
Weirong Ye
Yongfeng Zhang
Guosheng Hu
Marios Savvides
VPVLM
298
79
0
15 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
357
386
0
12 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation
  Model
Advancing Plain Vision Transformer Towards Remote Sensing Foundation ModelIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Guang Dai
253
316
0
08 Aug 2022
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
  Point-to-Pixel Prompting
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel PromptingNeural Information Processing Systems (NeurIPS), 2022
Ziyi Wang
Xumin Yu
Yongming Rao
Jie Zhou
Jiwen Lu
VPVLMVLM
217
98
0
04 Aug 2022
Unified Normalization for Accelerating and Stabilizing Transformers
Unified Normalization for Accelerating and Stabilizing TransformersACM Multimedia (ACM MM), 2022
Qiming Yang
Kai Zhang
Chaoxiang Lan
Zhi Yang
Zheyang Li
Wenming Tan
Jun Xiao
Shiliang Pu
165
10
0
02 Aug 2022
giMLPs: Gate with Inhibition Mechanism in MLPs
Cheng Kang
Jindich Prokop
Lei Tong
Huiyu Zhou
Yong Hu
Daneil Novak
141
0
0
01 Aug 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated
  Convolutions
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated ConvolutionsNeural Information Processing Systems (NeurIPS), 2022
Yongming Rao
Wenliang Zhao
Yansong Tang
Jie Zhou
Ser-Nam Lim
Jiwen Lu
ViT
350
326
0
28 Jul 2022
Visual Recognition by Request
Visual Recognition by RequestComputer Vision and Pattern Recognition (CVPR), 2022
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
212
16
0
28 Jul 2022
Previous
123...16171819
Next