Consistent Targets Provide Better Supervision in Semi-supervised Object
Detection
In this study, we dive deep into the inconsistency of pseudo targets in semi-supervised object detection (SSOD). Our core observation is that the oscillating pseudo targets undermine the training of an accurate semi-supervised detector. It not only inject noise into student training but also lead to severe overfitting on the classification task. Therefore, we propose a systematic solution, termed Consistent-Teacher, to reduce the inconsistency. First, adaptive anchor assignment~(ASA) substitutes the static IoU-based strategy, which enables the student network to be resistant to noisy pseudo bounding boxes; Then we calibrate the subtask predictions by designing a 3D feature alignment module~(FAM-3D). It allows each classification feature to adaptively query the optimal feature vector for the regression task at arbitrary scales and locations. Lastly, a Gaussian Mixture Model (GMM) dynamically revises the score threshold of the pseudo-bboxes, which stabilizes the number of ground-truths at an early stage and remedies the unreliable supervision signal during training. Consistent-Teacher provides strong results on a large range of SSOD evaluations. It achieves 40.0 mAP with ResNet-50 backbone given only 10\% of annotated MS-COCO data, which surpasses previous baselines using pseudo labels by around 3 mAP. When trained on fully annotated MS-COCO with additional unlabeled data, the performance further increases to 47.2 mAP. Our code will be open-sourced soon.
View on arXiv