Buildings' segmentation is a fundamental task in the field of earth observation and aerial imagery analysis. Most existing deep learning-based methods in the literature can be applied to fixed or narrow-ranged spatial resolution imagery. In practical scenarios, users deal with a broad spectrum of image resolutions. Thus, a given aerial image often needs to be re-sampled to match the spatial resolution of the dataset used to train the deep learning model, which results in a degradation in segmentation performance. To overcome this, we propose a Scale-invariant Neural Network (Sci-Net) that can segment buildings present in aerial images at different spatial resolutions. Specifically, our approach leverages UNet hierarchical representations and dilated convolutions to extract fine-grained multi-scale representations. Our method significantly outperforms other state of the art models on the Open Cities AI dataset with a steady improvements margin across different resolutions.
View on arXiv