173
v1v2v3v4 (latest)

Proofs as Explanations: Short Certificates for Reliable Predictions

Annual Conference Computational Learning Theory (COLT), 2025
Main:12 Pages
Bibliography:4 Pages
Appendix:4 Pages
Abstract

We consider a model for explainable AI in which an explanation for a prediction h(x)=yh(x)=y consists of a subset SS' of the training data (if it exists) such that all classifiers hHh' \in H that make at most bb mistakes on SS' predict h(x)=yh'(x)=y. Such a set SS' serves as a proof that xx indeed has label yy under the assumption that (1) the target function hh^\star belongs to HH, and (2) the set SS contains at most bb corrupted points. For example, if b=0b=0 and HH is the family of linear classifiers in Rd\mathbb{R}^d, and if xx lies inside the convex hull of the positive data points in SS (and hence every consistent linear classifier labels xx as positive), then Carathéodory's theorem states that xx lies inside the convex hull of d+1d+1 of those points. So, a set SS' of size d+1d+1 could be released as an explanation for a positive prediction, and would serve as a short proof of correctness of the prediction under the assumption of realizability.In this work, we consider this problem more generally, for general hypothesis classes HH and general values b0b\geq 0. We define the notion of the robust hollow star number of HH (which generalizes the standard hollow star number), and show that it precisely characterizes the worst-case size of the smallest certificate achievable, and analyze its size for natural classes. We also consider worst-case distributional bounds on certificate size, as well as distribution-dependent bounds that we show tightly control the sample size needed to get a certificate for any given test example. In particular, we define a notion of the certificate coefficient εx\varepsilon_x of an example xx with respect to a data distribution DD and target function hh^\star, and prove matching upper and lower bounds on sample size as a function of εx\varepsilon_x, bb, and the VC dimension dd of HH.

View on arXiv
Comments on this paper