173
v1v2 (latest)

Multiclass Classification, Information, Divergence, and Surrogate Risk

Abstract

We provide a unifying view of statistical information measures, multi-way Bayesian hypothesis testing, loss functions for multi-class classification problems, and multi-distribution ff-divergences, elaborating equivalence results between all of these objects, and extending existing results for binary outcome spaces to more general ones. We consider a generalization of ff-divergences to multiple distributions, and we provide a constructive equivalence between divergences, statistical information (in the sense of DeGroot), and losses for multiclass classification. A major application of our results is in multi-class classification problems in which we must both infer a discriminant function γ\gamma---for making predictions on a label YY from datum XX---and a data representation (or, in the setting of a hypothesis testing problem, an experimental design), represented as a quantizer q\mathsf{q} from a family of possible quantizers Q\mathsf{Q}. In this setting, we characterize the equivalence between loss functions, meaning that optimizing either of two losses yields an optimal discriminant and quantizer q\mathsf{q}, complementing and extending earlier results of Nguyen et. al. to the multiclass case. Our results provide a more substantial basis than standard classification calibration results for comparing different losses: we describe the convex losses that are consistent for jointly choosing a data representation and minimizing the (weighted) probability of error in multiclass classification problems.

View on arXiv
Comments on this paper