P-values for classification

Let be a random variable consisting of an observed feature vector and an unobserved class label with unknown joint distribution. In addition, let be a training data set consisting of completely observed independent copies of . Usual classification procedures provide point predictors (classifiers) of or estimate the conditional distribution of given . In order to quantify the certainty of classifying we propose to construct for each a p-value for the null hypothesis that , treating temporarily as a fixed parameter. In other words, the point predictor is replaced with a prediction region for with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.
View on arXiv