Performance of empirical risk minimization in linear aggregation
- FedML
Abstract
We study conditions under which, given a dictionary and an iid sample , the empirical minimizer in relative to the squared loss, satisfies that with high probability \begin{equation*} R(\tilde f^{ERM})\leq \inf_{f\in {\rm span}(F)}R(f)+r_N(M), \end{equation*} where is the quadratic risk and is of the order of . We show that if one assumes that and almost surely for every function in the dictionary, the empirical risk minimization procedure may still perform poorly, and in particular, its performance is far from the rate . On the other hand, under mild assumptions on (a uniform small-ball estimates for functions in ), ERM in does achieve the rate of .
View on arXivComments on this paper
