Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.14030
Cited By
v1
v2 (latest)
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
IEEE International Conference on Computer Vision (ICCV), 2021
25 March 2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (5 upvotes)
Github (14835★)
Papers citing
"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"
50 / 8,530 papers shown
WMamba: Wavelet-based Mamba for Face Forgery Detection
Siran Peng
Tianshuo Zhang
Li Gao
Xiangyu Zhu
Huatian Zhang
Kai Pang
Zhen Lei
Mamba
360
7
0
16 Jan 2025
NeurOp-Diff:Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion
Zihao Xu
Yuzhi Tang
Bowen Xu
Qingquan Li
DiffM
317
5
0
15 Jan 2025
Self Pre-training with Adaptive Mask Autoencoders for Variable-Contrast 3D Medical Imaging
IEEE International Symposium on Biomedical Imaging (ISBI), 2025
Badhan Kumar Das
Gengyan Zhao
Han Liu
Thomas J. Re
Dorin Comaniciu
Eli Gibson
Andreas Maier
ViT
MedIm
214
2
0
15 Jan 2025
Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching
IEEE International Conference on Data Engineering (ICDE), 2025
Meng Wang
Jintao Yang
Bin Yang
Hui Li
Tongxin Gong
Bo Yang
Jiangtao Cui
AI4TS
169
5
0
14 Jan 2025
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
IEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Yifan Wang
Pingping Zhang
Lijun Wang
Huchuan Lu
Mamba
VOS
142
14
0
14 Jan 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
479
8
0
14 Jan 2025
Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation
Xianping Ma
Ziyao Wang
Yin Hu
Xiaokang Zhang
Man-On Pun
242
6
0
13 Jan 2025
Toward Realistic Camouflaged Object Detection: Benchmarks and Method
Zhimeng Xin
Tianxu Wu
Shiming Chen
Shuo Ye
Zijing Xie
Yixiong Zou
Xinge You
Yufei Guo
202
0
0
13 Jan 2025
EdgeTAM: On-Device Track Anything Model
Computer Vision and Pattern Recognition (CVPR), 2025
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
315
8
0
13 Jan 2025
MathReader : Text-to-Speech for Mathematical Documents
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Sieun Hyeon
Kyudan Jung
N. Kim
Hyun Gon Ryu
Jaeyoung Do
317
5
0
13 Jan 2025
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities
Jialin Wu
Kaikai Pan
Yanjiao Chen
Jiangyi Deng
Shengyuan Pang
Wei Dong
ViT
AAML
278
1
0
13 Jan 2025
Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective
Jinjing Zhu
Songze Li
Lin Wang
328
0
0
13 Jan 2025
Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training
Ziqing Wen
Ping Luo
Jun Wang
Xiaoge Deng
Jinping Zou
Kun Yuan
Tao Sun
Dongsheng Li
CLL
345
0
0
13 Jan 2025
CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Information Fusion (Inf. Fusion), 2025
Yongqian Li
Yang Yang
Zhen Lei
3DPC
274
5
0
11 Jan 2025
YO-CSA-T: A Real-time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention
Yuan Lai
Zhiwei Shi
Chengxi Zhu
81
3
0
11 Jan 2025
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity
Navin Ranjan
Andreas E. Savakis
MQ
215
5
0
10 Jan 2025
HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection
Anant Mehta
Bryant McArthur
Nagarjuna Kolloju
Zhengzhong Tu
282
4
0
10 Jan 2025
BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response
Hongruixuan Chen
Jian Song
Olivier Dietrich
Clifford Broni-bediako
Weihao Xuan
...
Yimin Wei
J. Xia
Cuiling Lan
Konrad Schindler
Xiangwei Zhu
906
34
0
10 Jan 2025
Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph
International Journal of Computer Vision (IJCV), 2024
Donglin Di
Jiahui Yang
Chaofan Luo
Zhou Xue
Wei Chen
Xun Yang
Yue Gao
3DGS
353
19
0
10 Jan 2025
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang
Yanbo Xu
Naoto Usuyama
Hanwen Xu
J. Bagga
...
Carlo Bifulco
M. Lungren
Tristan Naumann
Sheng Wang
Hoifung Poon
LM&MA
MedIm
797
455
0
10 Jan 2025
MHAFF: Multi-Head Attention Feature Fusion of CNN and Transformer for Cattle Identification
IEEE Transactions on AgriFood Electronics (TAE), 2025
Rabin Dulal
Lihong Zheng
M. A. Kabir
ViT
109
6
0
10 Jan 2025
MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Srijan Das
Mamba
599
5
0
10 Jan 2025
CAMs as Shapley Value-based Explainers
The Visual Computer (Vis. Comput.), 2025
Huaiguang Cai
FAtt
233
5
0
09 Jan 2025
Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation
IEEE Transactions on Medical Imaging (IEEE TMI), 2025
Xinyu Wang
Fuling Wang
Haowen Wang
Bo Jiang
Chuanfu Li
Longji Xu
Yonghong Tian
Jin Tang
MedIm
129
5
0
08 Jan 2025
Flemme: A Flexible and Modular Learning Platform for Medical Images
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Guoqing Zhang
Jingyun Yang
Yang Li
MedIm
276
2
0
08 Jan 2025
Siamese-DETR for Generic Multi-Object Tracking
IEEE Transactions on Image Processing (IEEE TIP), 2023
Qiankun Liu
Yichen Li
Yuqi Jiang
Ying Fu
VOT
310
14
0
08 Jan 2025
AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish
Nejc Novak
Daniel Lehotský
Vasiliki Ismiroglou
Niels Madsen
T. Moeslund
Malte Pedersen
168
2
0
08 Jan 2025
Learning Informative Latent Representation for Quantum State Tomography
IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), 2023
Hailan Ma
Zhenhong Sun
Daoyi Dong
Dong Gong
310
4
0
08 Jan 2025
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
Neural Information Processing Systems (NeurIPS), 2025
Sy-Tuyen Ho
Tuan Van Vo
Somayeh Ebrahimkhani
Ngai-Man Cheung
304
1
0
08 Jan 2025
GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yan Lu
Cheng Wang
Lei Yang
Tianzhu Zhang
Yating Liu
Qi Chu
Tong He
Yonghui Li
W. Ouyang
527
16
0
08 Jan 2025
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Aadya Arora
Vinay Namboodiri
VLM
58
3
0
08 Jan 2025
Clinical Insights: A Comprehensive Review of Language Models in Medicine
PLOS Digital Health (PDH), 2024
Nikita Neveditsin
Pawan Lingras
V. Mago
LM&MA
588
18
0
08 Jan 2025
BEN: Using Confidence-Guided Matting for Dichotomous Image Segmentation
Maxwell Meyer
Jack Spruyt
305
5
0
08 Jan 2025
NBBOX: Noisy Bounding Box Improves Remote Sensing Object Detection
IEEE Geoscience and Remote Sensing Letters (GRSL), 2024
Yechan Kim
SooYeon Kim
Moongu Jeon
ViT
362
5
0
08 Jan 2025
Tighnari: Multi-modal Plant Species Prediction Based on Hierarchical Cross-Attention Using Graph-Based and Vision Backbone-Extracted Features
Conference and Labs of the Evaluation Forum (CLEF), 2025
Haixu Liu
Penghao Jiang
Zerui Tao
Muyan Wan
Qiuzhuang Sun
90
3
0
07 Jan 2025
PARF-Net: integrating pixel-wise adaptive receptive fields into hybrid Transformer-CNN network for medical image segmentation
Xu Ma
Mengsheng Chen
Junhui Zhang
Lijuan Song
Fang Du
Zhenhua Yu
ViT
MedIm
330
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
466
33
0
06 Jan 2025
Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Jiaze Li
Haoran Xu
Shiding Zhu
Junwei He
Haozhao Wang
VGen
EGVM
DiffM
148
2
0
06 Jan 2025
MObI: Multimodal Object Inpainting Using Diffusion Models
Alexandru Buburuzan
Anuj Sharma
John Redford
P. Dokania
Romain Mueller
DiffM
435
4
0
06 Jan 2025
Facial Attractiveness Prediction in Live Streaming: A New Benchmark and Multi-modal Method
Haoyang Li
Xiaoyu Ren
Hongjiu Yu
Huiyu Duan
Kai Li
Ying Chen
Libo Wang
Xiongkuo Min
Guoquan Zheng
Xu Liu
CVBM
421
1
0
05 Jan 2025
Time Series Language Model for Descriptive Caption Generation
Engineering applications of artificial intelligence (EAAI), 2025
M. Trabelsi
Aidan Boyd
Jin Cao
H. Uzunalioglu
AI4TS
182
8
0
03 Jan 2025
Merging Context Clustering with Visual State Space Models for Medical Image Segmentation
IEEE Transactions on Medical Imaging (IEEE TMI), 2025
Yun Zhu
Dong Zhang
Yi Lin
Yifei Feng
Jinhui Tang
Mamba
269
24
0
03 Jan 2025
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Lijie Tao
Han Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
361
1
0
03 Jan 2025
DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data
Yuanpeng Tu
Xi Chen
Ser-Nam Lim
Hengshuang Zhao
450
1
0
03 Jan 2025
Double-Flow GAN model for the reconstruction of perceived faces from brain activities
Zihao Wang
Jing Zhao
Xuetong Ding
Hui Zhang
CVBM
AI4CE
253
0
0
03 Jan 2025
A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
Dawen Yu
Shunping Ji
ViT
307
5
0
03 Jan 2025
Keypoint Aware Masked Image Modelling
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Madhava Krishna
Convin.AI
456
1
0
03 Jan 2025
A Study on Context Length and Efficient Transformers for Biomedical Image Analysis
Sarah M. Hooper
Hui Xue
ViT
MedIm
60
0
0
03 Jan 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yaxian Wang
Henghui Ding
Shuting He
Xudong Jiang
Bifan Wei
Jun Liu
ObjD
264
8
0
03 Jan 2025
A Separable Self-attention Inspired by the State Space Model for Computer Vision
Juntao Zhang
Shaogeng Liu
Kun Bian
Kun Bian
Pei Zhang
Jianning Liu
Jun Zhou
Bingyan Liu
Mamba
322
0
0
03 Jan 2025
Previous
1
2
3
...
37
38
39
...
169
170
171
Next
Page 38 of 171
Page
of 171
Go