347

PredDiff: Explanations and Interactions from Conditional Expectations

Artificial Intelligence (AI), 2021
Abstract

PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes when marginalizing out feature variables. In this work, we clarify properties of PredDiff and put forward several extensions of the original formalism. Most notably, we introduce a new measure for interaction effects. Interactions are an inevitable step towards a comprehensive understanding of black-box models. Importantly, our framework readily allows to investigate interactions between arbitrary feature subsets and scales linearly with their number. We demonstrate the soundness of PredDiff relevances and interactions both in the classification and regression setting. To this end, we use different analytic, synthetic and real-world datasets.

View on arXiv
Comments on this paper