v1v2 (latest)

From Pixels to Images: A Structural Survey of Deep Learning Paradigms in Remote Sensing Image Semantic Segmentation

21 May 2025

ArXiv (abs)PDF HTML Github (3★)

Main:24 Pages

9 Figures

Bibliography:10 Pages

Abstract

Semantic segmentation (SS) of RSIs enables the fine-grained interpretation of surface features, making it a critical task in RS analysis. With the increasing diversity and volume of RSIs collected by sensors on various platforms, traditional processing methods struggle to maintain efficiency and accuracy. In response, deep learning (DL) has emerged as a transformative approach, enabling substantial advances in remote sensing image semantic segmentation (RSISS) by automating hierarchical feature extraction and improving segmentation performance across diverse modalities. As data scale and model capacity have increased, DL-based RSISS has undergone a structural evolution from pixel-level and patch-based classification to tile-level, end-to-end segmentation, and, more recently, to image-level modelling with vision foundation models. However, existing reviews often focus on individual components, such as supervision strategies or fusion stages, and lack a unified operational perspective aligned with segmentation granularity and the training/inference pipeline. This paper provides a comprehensive review by organizing DL-based RSISS into a pixel-patch-tile-image hierarchy, covering early pixel-based methods, prevailing patch-based and tile-based techniques, and emerging image-based approaches. This review offers a holistic and structured understanding of DL-based RSISS, highlighting representative datasets, comparative insights, and open challenges related to data scale, model efficiency, domain robustness, and multimodal integration. Furthermore, to facilitate reproducible research, curated code collections are provided at:this https URLandthis https URL.

View on arXiv

Comments on this paper