SO-DETR: Leveraging Dual-Domain Features and Knowledge Distillation for Small Object Detection

11 April 2025

Abstract

Detection Transformer-based methods have achieved significant advancements in general object detection. However, challenges remain in effectively detecting small objects. One key difficulty is that existing encoders struggle to efficiently fuse low-level features. Additionally, the query selection strategies are not effectively tailored for small objects. To address these challenges, this paper proposes an efficient model, Small Object Detection Transformer (SO-DETR). The model comprises three key components: a dual-domain hybrid encoder, an enhanced query selection mechanism, and a knowledge distillation strategy. The dual-domain hybrid encoder integrates spatial and frequency domains to fuse multi-scale features effectively. This approach enhances the representation of high-resolution features while maintaining relatively low computational overhead. The enhanced query selection mechanism optimizes query initialization by dynamically selecting high-scoring anchor boxes using expanded IoU, thereby improving the allocation of query resources. Furthermore, by incorporating a lightweight backbone network and implementing a knowledge distillation strategy, we develop an efficient detector for small objects. Experimental results on the VisDrone-2019-DET and UAVVaste datasets demonstrate that SO-DETR outperforms existing methods with similar computational demands. The project page is available atthis https URL.

View on arXiv

@article{zhang2025_2504.11470,
  title={ SO-DETR: Leveraging Dual-Domain Features and Knowledge Distillation for Small Object Detection },
  author={ Huaxiang Zhang and Hao Zhang and Aoran Mei and Zhongxue Gan and Guo-Niu Zhu },
  journal={arXiv preprint arXiv:2504.11470},
  year={ 2025 }
}

Comments on this paper