Performance of empirical risk minimization in linear aggregation

24 February 2014

Abstract

We study conditions under which, given a dictionary $F=\{f_1,...,f_M\}$ and an iid sample $(X_i,Y_i)_{i=1}^N$ , the empirical minimizer in ${\rm span}(F)$ relative to the squared loss, satisfies that with high probability \begin{equation*} R(\tilde f^{ERM})\leq \inf_{f\in {\rm span}(F)}R(f)+r_N(M), \end{equation*} where $R(\cdot)$ is the quadratic risk and $r_N(M)$ is of the order of $M/N$ . We show that if one assumes that $|Y|\leq 1$ and $|f(X)|\leq 1$ almost surely for every function in the dictionary, the empirical risk minimization procedure may still perform poorly, and in particular, its performance is far from the rate $M/N$ . On the other hand, under mild assumptions on $F$ (a uniform small-ball estimates for functions in ${\rm span}(F)$ ), ERM in ${\rm span}(F)$ does achieve the rate of $M/N$ .

View on arXiv

Comments on this paper