29
0

A constraints-based approach to fully interpretable neural networks for detecting learner behaviors

Abstract

The increasing use of complex machine learning models in education has led to concerns about their interpretability, which in turn has spurred interest in developing explainability techniques that are both faithful to the model's inner workings and intelligible to human end-users. In this paper, we describe a novel approach to creating a neural-network-based behavior detection model that is interpretable by design. Our model is fully interpretable, meaning that the parameters we extract for our explanations have a clear interpretation, fully capture the model's learned knowledge about the learner behavior of interest, and can be used to create explanations that are both faithful and intelligible. We achieve this by implementing a series of constraints to the model that both simplify its inference process and bring it closer to a human conception of the task at hand. We train the model to detect gaming-the-system behavior, evaluate its performance on this task, and compare its learned patterns to those identified by human experts. Our results show that the model is successfully able to learn patterns indicative of gaming-the-system behavior while providing evidence for fully interpretable explanations. We discuss the implications of our approach and suggest ways to evaluate explainability using a human-grounded approach.

View on arXiv
@article{pinto2025_2504.20055,
  title={ A constraints-based approach to fully interpretable neural networks for detecting learner behaviors },
  author={ Juan D. Pinto and Luc Paquette },
  journal={arXiv preprint arXiv:2504.20055},
  year={ 2025 }
}
Comments on this paper