Fully Bayesian Logistic Regression with Hyper-Lasso Priors for High-dimensional Feature Selection

13 May 2014

Abstract

High-dimensional feature selection arises in many areas of modern sciences. For example, in genomic research we want to find the genes that can be used to separate tissues of different classes (eg. cancer and normal) from tens of thousands of genes that are active (expressed) in certain tissue cells. To this end, we wish to fit regression and classification models with a large number of features (also called variables, predictors), which is still a tremendous challenge to date. In the past few years, penalized likelihood methods for fitting regression models based on hyper-lasso penalization have been explored considerably in the literature. However, fully Bayesian methods that use Markov chain Monte Carlo (MCMC) for fitting regression and classification models with hyper-lasso priors are still lack of investigation. In this paper, we introduce a new class of methods for fitting Bayesian logistic regression models with hyper-lasso priors using Hamiltonian Monte Carlo in restricted Gibbs sampling framework. We call our methods BLRHT for short. We use intensive comparative simulation studies to test BLRHT by comparing to LASSO, and to investigate the problems of choosing heaviness and scale in BLRHT. The main findings are that the choice of heaviness of prior plays a critical role in BLRHT, and that BLRHT is relatively robust to the choice of prior scale. We further demonstrate and investigate BLRHT in an application to a real microarray data set related to prostate cancer, which confirms the previous findings. An R add-on package called \texttt{BLRHL} will be available from \url{http://math.usask.ca/~longhai/software/BLRHT}.

View on arXiv

Comments on this paper