Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models

29 November 2025

Mohammed Mohiuddin

Syed Mohammod Minhaz Hossain

Sumaiya Khanam

Prionkar Barua

Aparup Barua

MD Tamim Hossain

3DH

ArXiv (abs)PDF HTML Github

Abstract

Yoga is a popular form of exercise worldwide due to its spiritual and physical health benefits, but incorrect postures can lead to injuries. Automated yoga pose classification has therefore gained importance to reduce reliance on expert practitioners. While human pose keypoint extraction models have shown high potential in action recognition, systematic benchmarking for yoga pose recognition remains limited, as prior works often focus solely on raw images or a single pose extraction model. In this study, we introduce a curated dataset, 'Yoga-16', which addresses limitations of existing datasets, and systematically evaluate three deep learning architectures (VGG16, ResNet50, and Xception) using three input modalities (direct images, MediaPipe Pose skeleton images, and YOLOv8 Pose skeleton images). Our experiments demonstrate that skeleton-based representations outperform raw image inputs, with the highest accuracy of 96.09% achieved by VGG16 with MediaPipe Pose skeleton input. Additionally, we provide interpretability analysis using Grad-CAM, offering insights into model decision-making for yoga pose classification with cross validation analysis.

View on arXiv

Main:4 Pages

15 Figures

18 Tables

Appendix:30 Pages

Comments on this paper