467

Performance of empirical risk minimization in linear aggregation

Abstract

We study conditions under which, given a dictionary F={f1,...,fM}F=\{f_1,...,f_M\} and an iid sample (Xi,Yi)i=1N(X_i,Y_i)_{i=1}^N, the empirical minimizer in span(F){\rm span}(F) relative to the squared loss, satisfies that with high probability \begin{equation*} R(\tilde f^{ERM})\leq \inf_{f\in {\rm span}(F)}R(f)+r_N(M), \end{equation*} where R()R(\cdot) is the quadratic risk and rN(M)r_N(M) is of the order of M/NM/N. We show that if one assumes that Y1|Y|\leq 1 and f(X)1|f(X)|\leq 1 almost surely for every function in the dictionary, the empirical risk minimization procedure may still perform poorly, and in particular, its performance is far from the rate M/NM/N. On the other hand, under mild assumptions on FF (a uniform small-ball estimates for functions in span(F){\rm span}(F)), ERM in span(F){\rm span}(F) does achieve the rate of M/NM/N.

View on arXiv
Comments on this paper