Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems
Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor-intensive. We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations validating the effectiveness of KCGen-KT. On a real-world dataset of student code submissions to open-ended programming problems, KCGen-KT outperforms existing KT methods. We investigate the learning curves of generated KCs and show that LLM-generated KCs have a comparable level-of-fit to human-written KCs under the performance factor analysis (PFA) model. We also conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.
View on arXiv@article{duan2025_2502.18632, title={ Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems }, author={ Zhangqi Duan and Nigel Fernandez and Sri Kanakadandi and Bita Akram and Andrew Lan }, journal={arXiv preprint arXiv:2502.18632}, year={ 2025 } }