19
38

Regularization and the small-ball method II: complexity dependent error rates

Abstract

For a convex class of functions FF, a regularization functions Ψ()\Psi(\cdot) and given the random data (Xi,Yi)i=1N(X_i, Y_i)_{i=1}^N, we study estimation properties of regularization procedures of the form \begin{equation*} \hat f \in {\rm argmin}_{f\in F}\Big(\frac{1}{N}\sum_{i=1}^N\big(Y_i-f(X_i)\big)^2+\lambda \Psi(f)\Big) \end{equation*} for some well chosen regularization parameter λ\lambda. We obtain bounds on the L2L_2 estimation error rate that depend on the complexity of the "true model" F:={fF:Ψ(f)Ψ(f)}F^*:=\{f\in F: \Psi(f)\leq\Psi(f^*)\}, where fargminfFE(Yf(X))2f^*\in {\rm argmin}_{f\in F}\mathbb{E}(Y-f(X))^2 and the (Xi,Yi)(X_i,Y_i)'s are independent and distributed as (X,Y)(X,Y). Our estimate holds under weak stochastic assumptions -- one of which being a small-ball condition satisfied by FF -- and for rather flexible choices of regularization functions Ψ()\Psi(\cdot). Moreover, the result holds in the learning theory framework: we do not assume any a-priori connection between the output YY and the input XX. As a proof of concept, we apply our general estimation bound to various choices of Ψ\Psi, for example, the p\ell_p and SpS_p-norms (for p1p\geq1), weak-p\ell_p, atomic norms, max-norm and SLOPE. In many cases, the estimation rate almost coincides with the minimax rate in the class FF^*.

View on arXiv
Comments on this paper