Simultaneously achieving parsimony and good predictive power in high dimensions is a main challenge in statistics. Non-local priors (NLPs) possess appealing properties for high-dimensional model choice, but their use for estimation has not been studied in detail. We show that, for regular models, Bayesian model averaging (BMA) estimates based on NLPs shrink spurious parameters either at fast polynomial or quasi-exponential rates as the sample size increases (depending on the chosen prior density). Non-spurious parameter estimates only differ from the oracle MLE by a factor of . We extend some results to linear models with dimension growing with . Coupled with our theoretical investigations, we outline the constructive representation of NLPs as mixtures of truncated distributions. From a practitioners' perspective, our work enables simple posterior sampling and extending NLPs beyond previous proposals. Our results show notable high-dimensional estimation for linear models with at reduced computational cost. NLPs provided lower estimation error than benchmark and hyper-g priors, SCAD and LASSO in simulations, and in gene expression data achieved higher cross-validated with an order of magnitude less predictors. Remarkably, these results were obtained without the need to pre-screen predictors. Our findings contribute to the debate of whether different priors should be used for estimation and model selection, showing that selection priors may actually be desirable for high-dimensional estimation.
View on arXiv