66
v1v2v3 (latest)

Conditional regression for the Nonlinear Single-Variable Model

Main:56 Pages
12 Figures
Bibliography:7 Pages
1 Tables
Abstract

Regressing a function FF on Rd\mathbb{R}^d without the statistical and computational curse of dimensionality requires special statistical models, for example that impose geometric assumptions on the distribution of the data (e.g., that its support is low-dimensional), or strong smoothness assumptions on FF, or a special structure FF. Among the latter, compositional models F=fgF=f\circ g with gg mapping to Rr\mathbb{R}^r with rdr\ll d include classical single- and multi-index models, as well as neural networks. While the case where gg is linear is well-understood, less is known when gg is nonlinear, and in particular for which gg's the curse of dimensionality in estimating FF, or both ff and gg, may be circumvented. Here we consider a model F(X):=f(ΠγX)F(X):=f(\Pi_\gamma X) where Πγ:Rd[0,lenγ]\Pi_\gamma:\mathbb{R}^d\to[0,\textrm{len}_\gamma] is the closest-point projection onto the parameter of a regular curve γ:[0,lenγ]Rd\gamma:[0, \textrm{len}_\gamma]\to\mathbb{R}^d, and f:[0,lenγ]R1f:[0,\textrm{len}_\gamma]\to \mathbb{R}^1. The input data XX is not low-dimensional: it can be as far from γ\gamma as the condition that Πγ(X)\Pi_\gamma(X) is well-defined allows. The distribution XX, the curve γ\gamma and the function ff are all unknown. This model is a natural nonlinear generalization of the single-index model, corresponding to γ\gamma being a line. We propose a nonparametric estimator, based on conditional regression, that under suitable assumptions, the strongest of which being that ff is coarsely monotone, achieves, up to log factors, the one-dimensional\textit{one-dimensional} optimal min-max rate for non-parametric regression, up to the level of noise in the observations, and be constructed in time O(d2nlogn)\mathcal{O}(d^2 n\log n). All the constants in the learning bounds, in the minimal number of samples required for our bounds to hold, and in the computational complexity are at most low-order polynomials in dd.

View on arXiv
Comments on this paper