44
v1v2 (latest)

Joint Distribution-Informed Shapley Values for Sparse Counterfactual Explanations

Main:10 Pages
10 Figures
5 Tables
Appendix:13 Pages
Abstract

Counterfactual explanations (CE) aim to reveal how small input changes flip a model's prediction, yet many methods modify more features than necessary, reducing clarity and actionability. We introduce \emph{COLA}, a model- and generator-agnostic post-hoc framework that refines any given CE by computing a coupling via optimal transport (OT) between factual and counterfactual sets and using it to drive a Shapley-based attribution (\emph{pp-SHAP}) that selects a minimal set of edits while preserving the target effect. Theoretically, OT minimizes an upper bound on the W1W_1 divergence between factual and counterfactual outcomes and that, under mild conditions, refined counterfactuals are guaranteed not to move farther from the factuals than the originals. Empirically, across four datasets, twelve models, and five CE generators, COLA achieves the same target effects with only 26--45\% of the original feature edits. On a small-scale benchmark, COLA shows near-optimality.

View on arXiv
Comments on this paper