This report presents our semantic segmentation framework developed by team ACVLAB for the ICRA 2025 GOOSE 2D Semantic Segmentation Challenge, which focuses on parsing outdoor scenes into nine semantic categories under real-world conditions. Our method integrates a Swin Transformer backbone enhanced with Rotary Position Embedding (RoPE) for improved spatial generalization, alongside a Color Shift Estimation-and-Correction module designed to compensate for illumination inconsistencies in natural environments. To further improve training stability, we adopt a quantile-based denoising strategy that downweights the top 2.5\% of highest-error pixels, treating them as noise and suppressing their influence during optimization. Evaluated on the official GOOSE test set, our approach achieved a mean Intersection over Union (mIoU) of 0.848, demonstrating the effectiveness of combining color correction, positional encoding, and error-aware denoising in robust semantic segmentation.
View on arXiv@article{hsu2025_2505.06991, title={ Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding }, author={ Chih-Chung Hsu and I-Hsuan Wu and Wen-Hai Tseng and Ching-Heng Cheng and Ming-Hsuan Wu and Jin-Hui Jiang and Yu-Jou Hsiao }, journal={arXiv preprint arXiv:2505.06991}, year={ 2025 } }