103
32

Sparse recovery in convex hulls via entropy penalization

Abstract

Let (X,Y)(X,Y) be a random couple in S×TS\times T with unknown distribution PP and (X1,Y1),...,(Xn,Yn)(X_1,Y_1),...,(X_n,Y_n) be i.i.d. copies of (X,Y).(X,Y). Denote PnP_n the empirical distribution of (X1,Y1),...,(Xn,Yn).(X_1,Y_1),...,(X_n,Y_n). Let h1,...,hN:S[1,1]h_1,...,h_N:S\mapsto [-1,1] be a dictionary that consists of NN functions. For λRN,\lambda \in {\mathbb{R}}^N, denote fλ:=j=1Nλjhj.f_{\lambda}:=\sum_{j=1}^N\lambda_jh_j. Let :T×RR\ell:T\times {\mathbb{R}}\mapsto {\mathbb{R}} be a given loss function and suppose it is convex with respect to the second variable. Let (f)(x,y):=(y;f(x)).(\ell \bullet f)(x,y):=\ell(y;f(x)). Finally, let ΛRN\Lambda \subset {\mathbb{R}}^N be the simplex of all probability distributions on {1,...,N}.\{1,...,N\}. Consider the following penalized empirical risk minimization problem \begin{eqnarray*}\hat{\lambda}^{\varepsilon}:={\mathop {argmin}_{\lambda\in \Lambda}}\Biggl[P_n(\ell \bullet f_{\lambda})+\varepsilon \sum_{j=1}^N\lambda_j\log \lambda_j\Biggr]\end{eqnarray*} along with its distribution dependent version \begin{eqnarray*}\lambda^{\varepsilon}:={\mathop {argmin}_{\lambda\in \Lambda}}\Biggl[P(\ell \bullet f_{\lambda})+\varepsilon \sum_{j=1}^N\lambda_j\log \lambda_j\Biggr],\end{eqnarray*} where ε0\varepsilon\geq 0 is a regularization parameter. It is proved that the ``approximate sparsity'' of λε\lambda^{\varepsilon} implies the ``approximate sparsity'' of λ^ε\hat{\lambda}^{\varepsilon} and the impact of ``sparsity'' on bounding the excess risk of the empirical solution is explored. Similar results are also discussed in the case of entropy penalized density estimation.

View on arXiv
Comments on this paper