VisioPhysioENet: Multimodal Engagement Detection using Visual and Physiological Signals

24 September 2024

Abstract

This paper presents VisioPhysioENet, a novel multimodal system that leverages visual and physiological signals to detect learner engagement. It employs a two-level approach for extracting both visual and physiological features. For visual feature extraction, Dlib is used to detect facial landmarks, while OpenCV provides additional estimations. The face recognition library, built on Dlib, is used to identify the facial region of interest specifically for physiological signal extraction. Physiological signals are then extracted using the plane-orthogonal-toskin method to assess cardiovascular activity. These features are integrated using advanced machine learning classifiers, enhancing the detection of various levels of engagement. We thoroughly tested VisioPhysioENet on the DAiSEE dataset. It achieved an accuracy of 63.09%. This shows it can better identify different levels of engagement compared to many existing methods. It performed 8.6% better than the only other model that uses both physiological and visual features.

View on arXiv

@article{singh2025_2409.16126,
  title={ VisioPhysioENet: Multimodal Engagement Detection using Visual and Physiological Signals },
  author={ Alakhsimar Singh and Nischay Verma and Kanav Goyal and Amritpal Singh and Puneet Kumar and Xiaobai Li },
  journal={arXiv preprint arXiv:2409.16126},
  year={ 2025 }
}

Comments on this paper