How Self-Supervised Learning Can be Used for Fine-Grained Head Pose Estimation?

10 August 2021

Abstract

The cost of Head View point labels is the main hurdle in the improving of fine-grained Head Pose estimation algorithm. One solution to the lack of huge number of labels is using Self-Supervised Learning (SSL). SSL can extract good features from unlabeled data for a downstream task. Accordingly, this article has tried to answer a question: How Self-Supervised Learning (SSL) can be used for Head Pose estimation? In general, there are two main approaches to use SSL: (1) Using it to pre-train the weights, (2) Leveraging SSL as an auxiliary task besides of Supervised Learning (SL) in one training session. In this study, we compared two approaches by designing a Hybrid Multi-Task Learning (HMTL) architecture and assessing it with two SSL pre-text tasks, the rotation and puzzling. Results showed that the combination of both methods in which using rotation for pre-training and using puzzling for auxiliary head were the best. Together, the error rate was reduced up to 13% compared to the baseline which is comparable with current SOTA methods. Finally, we compared the impact of initial weights on the HMTL and SL. Subsequently, by HMTL, the error was reduced with all kinds of initial weights: random, ImageNet and SSL.

View on arXiv

Comments on this paper