An Efficient Proximal Gradient Method for General Structured Sparse
Learning
We study the problem of learning high dimensional regression models regularized by a structured-sparsity-inducing penalty that encodes prior structural information on either input or output sides. We consider two widely adopted types of such penalties as our motivating examples: 1) overlapping-group-lasso penalty, based on mixed-norm, and 2) graph-guided fusion penalty. For both types of penalties, due to their non-separability, developing an efficient optimization method has remained a challenging problem. In this paper, we propose a general optimization framework, called proximal gradient method, which can solve the structured sparse learning problems with a smooth convex loss and a wide spectrum of non-smooth and non-separable structured-sparsity-inducing penalties, including the overlapping-group-lasso and graph-guided fusion penalties. Our method exploits the structure of such penalties, decouples the non-separable penalty function via the dual norm, introduces its smooth approximation, and solves this approximation function. It achieves a convergence rate significantly faster than the standard first-order method, subgradient method, and is much more scalable than the most widely used method, namely interior-point method for second-order cone programming and quadratic programming formulations. The efficiency and scalability of our method are demonstrated on both simulated and real genetic datasets.
View on arXiv