414

Pointwise adaptation via stagewise aggregation of local estimates for multiclass classification

Abstract

We consider a problem of multiclass classification, where the training sample Sn={(Xi,Yi)}i=1nS_n = \{(X_i, Y_i)\}_{i=1}^n is generated from the model p(Y=mX=x)=θm(x)\mathbb p(Y = m | X = x) = \theta_m(x), 1mM1 \leq m \leq M, and θ1(x),,θM(x)\theta_1(x), \dots, \theta_M(x) are unknown Lipschitz functions. Given a test point XX, our goal is to estimate θ1(X),,θM(X)\theta_1(X), \dots, \theta_M(X). An approach based on nonparametric smoothing uses a localization technique, i.e. the weight of observation (Xi,Yi)(X_i, Y_i) depends on the distance between XiX_i and XX. However, local estimates strongly depend on localizing scheme. In our solution we fix several schemes W1,,WKW_1, \dots, W_K, compute corresponding local estimates θ~(1),,θ~(K)\widetilde\theta^{(1)}, \dots, \widetilde\theta^{(K)} for each of them and apply an aggregation procedure. We propose an algorithm, which constructs a convex combination of the estimates θ~(1),,θ~(K)\widetilde\theta^{(1)}, \dots, \widetilde\theta^{(K)} such that the aggregated estimate behaves approximately as well as the best one from the collection θ~(1),,θ~(K)\widetilde\theta^{(1)}, \dots, \widetilde\theta^{(K)}. We also study theoretical properties of the procedure, prove oracle results and establish rates of convergence under mild assumptions.

View on arXiv
Comments on this paper