A sure independence screening procedure for ultra-high dimensional partially linear additive models

We introduce a two-step procedure, in the context of ultra-high dimensional additive models, which aims to reduce the size of covariates vector and distinguish linear and nonlinear effects among nonzero components. Our proposed screening procedure, in the first step, is constructed based on the concept of cumulative distribution function and conditional expectation of response in the framework of marginal correlation. B-splines and empirical distribution functions are used to estimate the two above measures. The sure property of this procedure is also established. In the second step, a double penalization based procedure is applied to identify nonzero and linear components, simultaneously. The performance of the designed method is examined by several test functions to show its capabilities against competitor methods when errors distribution are varied. Simulation studies imply that the proposed screening procedure can be applied to the ultra-high dimensional data and well detect the in uential covariates. It is also demonstrate the superiority in comparison with the existing methods. This method is also applied to identify most in uential genes for overexpression of a G protein-coupled receptor in mice.
View on arXiv