55
5
v1v2 (latest)

Fast rates for empirical risk minimization over càdlàg functions with bounded sectional variation norm

Abstract

Empirical risk minimization over classes functions that are bounded for some version of the variation norm has a long history, starting with Total Variation Denoising (Rudin et al., 1992), and has been considered by several recent articles, in particular Fang et al., 2019 and van der Laan, 2015. In this article, we consider empirical risk minimization over the class Fd\mathcal{F}_d of c\`adl\`ag functions over [0,1]d[0,1]^d with bounded sectional variation norm (also called Hardy-Krause variation). We show how a certain representation of functions in Fd\mathcal{F}_d allows to bound the bracketing entropy of sieves of Fd\mathcal{F}_d, and therefore derive rates of convergence in nonparametric function estimation. Specifically, for sieves whose growth is controlled by some rate ana_n, we show that the empirical risk minimizer has rate of convergence OP(n1/3(logn)2(d1)/3an)O_P(n^{-1/3} (\log n)^{2(d-1)/3} a_n). Remarkably, the dimension only affects the rate in nn through the logarithmic factor, making this method especially appropriate for high dimensional problems. In particular, we show that in the case of nonparametric regression over sieves of c\`adl\`ag functions with bounded sectional variation norm, this upper bound on the rate of convergence holds for least-squares estimators, under the random design, sub-exponential errors setting.

View on arXiv
Comments on this paper