31
5

A Multiclass Classification Approach to Label Ranking

Abstract

In multiclass classification, the goal is to learn how to predict a random label YY, valued in Y={1,  ,  K}\mathcal{Y}=\{1,\; \ldots,\; K \} with K3K\geq 3, based upon observing a r.v. XX, taking its values in Rq\mathbb{R}^q with q1q\geq 1 say, by means of a classification rule g:RqYg:\mathbb{R}^q\to \mathcal{Y} with minimum probability of error P{Yg(X)}\mathbb{P}\{Y\neq g(X) \}. However, in a wide variety of situations, the task targeted may be more ambitious, consisting in sorting all the possible label values yy that may be assigned to XX by decreasing order of the posterior probability ηy(X)=P{Y=yX}\eta_y(X)=\mathbb{P}\{Y=y \mid X \}. This article is devoted to the analysis of this statistical learning problem, halfway between multiclass classification and posterior probability estimation (regression) and referred to as label ranking here. We highlight the fact that it can be viewed as a specific variant of ranking median regression (RMR), where, rather than observing a random permutation Σ\Sigma assigned to the input vector XX and drawn from a Bradley-Terry-Luce-Plackett model with conditional preference vector (η1(X),  ,  ηK(X))(\eta_1(X),\; \ldots,\; \eta_K(X)), the sole information available for training a label ranking rule is the label YY ranked on top, namely Σ1(1)\Sigma^{-1}(1). Inspired by recent results in RMR, we prove that under appropriate noise conditions, the One-Versus-One (OVO) approach to multiclassification yields, as a by-product, an optimal ranking of the labels with overwhelming probability. Beyond theoretical guarantees, the relevance of the approach to label ranking promoted in this article is supported by experimental results.

View on arXiv
Comments on this paper