14
1

Gradient-based Counterfactual Explanations using Tractable Probabilistic Models

Abstract

Counterfactual examples are an appealing class of post-hoc explanations for machine learning models. Given input xx of class y1y_1, its counterfactual is a contrastive example xx^\prime of another class y0y_0. Current approaches primarily solve this task by a complex optimization: define an objective function based on the loss of the counterfactual outcome y0y_0 with hard or soft constraints, then optimize this function as a black-box. This "deep learning" approach, however, is rather slow, sometimes tricky, and may result in unrealistic counterfactual examples. In this work, we propose a novel approach to deal with these problems using only two gradient computations based on tractable probabilistic models. First, we compute an unconstrained counterfactual uu of xx to induce the counterfactual outcome y0y_0. Then, we adapt uu to higher density regions, resulting in xx^{\prime}. Empirical evidence demonstrates the dominant advantages of our approach.

View on arXiv
Comments on this paper