2
18

A Comparative Analysis of classification data mining techniques : Deriving key factors useful for predicting students performance

Abstract

Students opting for Engineering as their discipline is increasing rapidly. But due to various factors and inappropriate primary education in India, failure rates are high. Students are unable to excel in core engineering because of complex and mathematical subjects. Hence, they fail in such subjects. With the help of data mining techniques, we can predict the performance of students in terms of grades and failure in subjects. This paper performs a comparative analysis of various classification techniques, such as Na\"ive Bayes, LibSVM, J48, Random Forest, and JRip and tries to choose best among these. Based on the results obtained, we found that Na\"ive Bayes is the most accurate method in terms of students failure prediction and JRip is most accurate in terms of students grade prediction. We also found that JRip marginally differs from Na\"ive Bayes in terms of accuracy for students failure prediction and gives us a set of rules from which we derive the key factors influencing students performance. Finally, we suggest various ways to mitigate these factors. This study is limited to Indian Education system scenarios. However, the factors found can be helpful in other scenarios as well.

View on arXiv
Comments on this paper