A common strategy for sparse linear regression is to introduce regularization, which eliminates irrelevant features by letting the corresponding weights be zeros. Regularization, however, often shrinks the estimator for relevant features, which leads incorrect feature selection. Motivated by the above issue, we propose Bayesian masking (BM), a sparse estimation method which imposes no regularization on the weights. The key concept of BM is to introduce binary latent variables that randomly mask features. Estimating the masking rates determines the relevances of the features automatically. We derive a variational Bayesian inference algorithm that maximizes a lower bound of the factorized information criterion (FIC), which is a recently-developed asymptotic evaluation of the marginal log-likelihood. We also propose reparametrization that accelerates the convergence. We demonstrate that BM outperforms Lasso and automatic relevance determination (ARD) in terms of the sparsity-shrinkage trade-off.
View on arXiv