16
32

More efficient approximation of smoothing splines via space-filling basis selection

Abstract

We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size nn, the smoothing spline estimator can be expressed as a linear combination of nn basis functions, requiring O(n3)O(n^3) computational time when the number of predictors d2d\geq 2. Such a sizable computational cost hinders the broad applicability of smoothing splines. In practice, the full sample smoothing spline estimator can be approximated by an estimator based on qq randomly-selected basis functions, resulting in a computational cost of O(nq2)O(nq^2). It is known that these two estimators converge at the identical rate when qq is of the order O{n2/(pr+1)}O\{n^{2/(pr+1)}\}, where p[1,2]p\in [1,2] depends on the true function η\eta, and r>1r > 1 depends on the type of spline. Such qq is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting the ones corresponding to roughly equal-spaced observations, the proposed method chooses a set of basis functions with a large diversity. The asymptotic analysis shows our proposed smoothing spline estimator can decrease qq to roughly O{n1/(pr+1)}O\{n^{1/(pr+1)}\}, when dpr+1d\leq pr+1. Applications on synthetic and real-world datasets show the proposed method leads to a smaller prediction error compared with other basis selection methods.

View on arXiv
Comments on this paper