Melanoma is a malignant tumor originating from skin cell lesions. Accurate and efficient segmentation of skin lesions is essential for quantitative medical analysis but remains challenging. To address this, we propose ScaleFusionNet, a segmentation model that integrates Cross-Attention Transformer Module (CATM) and AdaptiveFusionBlock to enhance feature extraction and fusion. The model employs a hybrid architecture encoder that effectively captures both local and global features. We introduce CATM, which utilizes Swin Transformer Blocks and Cross Attention Fusion (CAF) to adaptively refine encoder-decoder feature fusion, reducing semantic gaps and improving segmentation accuracy. Additionally, the AdaptiveFusionBlock is improved by integrating adaptive multi-scale fusion, where Swin Transformer-based attention complements deformable convolution-based multi-scale feature extraction. This enhancement refines lesion boundaries and preserves fine-grained details. ScaleFusionNet achieves Dice scores of 92.94% and 91.65% on ISIC-2016 and ISIC-2018 datasets, respectively, demonstrating its effectiveness in skin lesion analysis. Our code implementation is publicly available at GitHub.
View on arXiv@article{qamar2025_2503.03327, title={ ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation }, author={ Saqib Qamar and Syed Furqan Qadri and Roobaea Alroobaea and Goram Mufarah M Alshmrani and Richard Jiang }, journal={arXiv preprint arXiv:2503.03327}, year={ 2025 } }