P-Values for Classification

18 January 2008

Abstract

Let (X,Y) be a random variable consisting of an observed feature vector X and an unobserved class label Y in {1, 2, ..., L} with unknown joint distribution. In addition let D be a training data set consisting of n completely observed independent copies of (X,Y). Usual classification procedures provide point predictors (classifiers) of Y or estimate posterior distributions of Y given X. In order to quantify the certainty of classifying X we propose to construct for each b = 1, 2, ..., L a p-value pi_b(X,D) for the null hypothesis that Y = b, treating Y temporarily as a fixed parameter. In other words, point predictors are replaced with a prediction region for Y with given confidence level in a frequentist sense. We argue that (i) this approach is advantegeous over traditional approaches and that (ii) any reasonable classifier can be modified to yield p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

View on arXiv

Comments on this paper