This research explores the positive application of deepfake technology for upper body generation, specifically sign language for the Deaf and Hard of Hearing (DHoH) community. Given the complexity of sign language and the scarcity of experts, the generated videos are vetted by a sign language expert for accuracy. We construct a reliable deepfake dataset, evaluating its technical and visual credibility using computer vision and natural language processing models. The dataset, consisting of over 1200 videos featuring both seen and unseen individuals, is also used to detect deepfake videos targeting vulnerable individuals. Expert annotations confirm that the generated videos are comparable to real sign language content. Linguistic analysis, using textual similarity scores and interpreter evaluations, shows that the interpretation of generated videos is at least 90% similar to authentic sign language. Visual analysis demonstrates that convincingly realistic deepfakes can be produced, even for new subjects. Using a pose/style transfer model, we pay close attention to detail, ensuring hand movements are accurate and align with the driving video. We also apply machine learning algorithms to establish a baseline for deepfake detection on this dataset, contributing to the detection of fraudulent sign language videos.
View on arXiv@article{naeem2025_2404.01438, title={ Generation and Detection of Sign Language Deepfakes - A Linguistic and Visual Analysis }, author={ Shahzeb Naeem and Muhammad Riyyan Khan and Usman Tariq and Abhinav Dhall and Carlos Ivan Colon and Hasan Al-Nashash }, journal={arXiv preprint arXiv:2404.01438}, year={ 2025 } }