Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real
- CVBM
The absence of large-scale masked face datasets challenges masked face detection and recognition. We propose a two-step generative data augmentation framework combining rule-based mask warping with unpaired image-to-image translation via GANs, producing masked face samples that go beyond rule-based overlays. Trained on about 19,100 images in the target domain (3.8% of IAMGAN's scale), or, including out-of-domain transfer pretraining, 59,600 and 11.8%, the proposed approach yields consistent improvements over rule-based warping alone and achieves results complementary to IAMGAN's, showing that both steps contribute. Evaluation is conducted directly on the generated samples and is qualitative; quantitative metrics like FID and KID were not applied as any real reference distribution would unfairly favor the model with closer training data. We introduce a non-mask preservation loss to reduce non-mask distortions and stabilize training, and stochastic noise injection to enhance sample diversity.Note: The paper originated as a coursework submission completed under resource constraints. Following an inexplicable scholarship termination, the author took on part-time employment to maintain research continuity, which led to a mid-semester domain pivot from medical imaging to masked face tasks due to company data restrictions. The work was completed alongside concurrent coursework with delayed compute access and without any AI assistance. It was submitted to a small venue at the semester end under an obligatory publication requirement and accepted without revision requests. Subsequent invitations to submit to first-tier venues were not pursued due to continued funding absence. Downstream evaluation on recognition or detection performance was not completed by the submission deadline. The note is added in response to subsequent comparisons and criticisms that did not account for these conditions.
View on arXiv