In this paper, we propose a novel touchless typing interface that makes use of an on-screen QWERTY keyboard and a smartphone camera. The keyboard was divided into nine color-coded clusters. The user moved their head toward clusters, which contained the letters that they wanted to type. A front-facing smartphone camera recorded the head movements. A bidirectional GRU based model which used pre-trained embedding rich in head pose features was employed to translate the recordings into cluster sequences. The model achieved an accuracy of 96.78% and 86.81% under intra- and inter-user scenarios, respectively, over a dataset of 2234 video sequences collected from 22 users.
View on arXiv