Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems

25 February 2025

Abstract

Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor-intensive. We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations validating the effectiveness of KCGen-KT. On a real-world dataset of student code submissions to open-ended programming problems, KCGen-KT outperforms existing KT methods. We investigate the learning curves of generated KCs and show that LLM-generated KCs have a comparable level-of-fit to human-written KCs under the performance factor analysis (PFA) model. We also conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.

View on arXiv

@article{duan2025_2502.18632,
  title={ Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems },
  author={ Zhangqi Duan and Nigel Fernandez and Sri Kanakadandi and Bita Akram and Andrew Lan },
  journal={arXiv preprint arXiv:2502.18632},
  year={ 2025 }
}

Comments on this paper