Automated Semantic Grading of Programs
We present a new method for automatically grading introductory programming assignments. In order to use this method, instructors provide a reference implementation of the assignment, and an error model consisting of potential corrections to errors that students might make. Using this information, the system automatically derives minimal corrections to student's incorrect solutions, providing them with a quantifiable measure of exactly how incorrect a given solution was, as well as feedback about what they did wrong. We introduce a simple language for describing error models in terms of correction rules, and formally define a rule-directed translation strategy that reduces the problem of finding minimal corrections in an incorrect program to the problem of synthesizing a correct program from a sketch. We have evaluated our system on over 1000 solution attempts by real beginner programmers. Our results show that relatively simple error models can correct on average 73% of fixable fraction of submissions with non-trivial errors. We also show that the error models generalize across different problems from the same category, and our technique scales well for more complex error models and programming assignments such as those found in AP level computer science final examinations.
View on arXiv