397

On the (In)fidelity and Sensitivity for Explanations

Abstract

We consider objective evaluation measures of explanations of complex black-box machine learning models. We propose simple robust variants of two notions that have been considered in recent literature: (in)fidelity, and sensitivity. We analyze optimal explanations with respect to both these measures, and while the optimal explanation for sensitivity is a vacuous constant explanation, that for our notion of infidelity is a novel combination of two popular explanation methods. Another salient question given these measures is how to modify \emph{any given explanation} to have better values with respect to these measures. We propose a simple modification based on lowering sensitivity, and moreover show that when done appropriately, we could simultaneously improve both sensitivity as well as fidelity.

View on arXiv
Comments on this paper