Entropic Variable Projection for Explainability and Intepretability

18 October 2018

François Bachoc

Abstract

In this paper, we present a new explainability formalism designed to explain how the possible values of each input variable in a whole test set impact the predictions given by black-box decision rules. This is particularly pertinent for instance to temper the trust in the predictions when specific variables are in a sensitive range of values, or more generally to explain the behaviour of machine learning decision rules in a context represented by the test set. Our main methodological contribution is to propose an information theory framework, based on entropic projections, in order to compute the influence of each input-output observation when emphasizing the impact of a variable. This formalism is thus the first unified and model agnostic framework enabling to interpret the dependence between the input variables, their impact on the prediction errors, and their influence on the output predictions. Importantly, it has in addition a low algorithmic complexity making it scalable to real-life large datasets. We illustrate our strategy by explaining complex decision rules learned using XGBoost and Random Forest classifiers. We finally make clear its differences with explainability strategies based on single observations, such as those of LIME or SHAP, when explaining the impact of different pixels on a deep learning classifier using the MNIST database.

View on arXiv

Comments on this paper