Machine learning models on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse metafeatures. A key finding of our analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations.
View on arXiv