A General Theory of Pathwise Coordinate Optimization
The pathwise coordinate optimization is one of the most important computational frameworks for solving high dimensional convex and nonconvex sparse learning problems. It differs from the classical coordinate optimization algorithms in three salient features: warm start initialization, active set updating, and strong rule for coordinate preselection. These three features grant superior empirical performance, but also pose significant challenge to theoretical analysis. To tackle this long lasting problem, we develop a new theory showing that these three features play pivotal roles in guaranteeing the outstanding statistical and computational performance of the pathwise coordinate optimization framework. In particular, we analyze the existing methods for pathwise coordinate optimization and provide new theoretical insights into them. The obtained theory motivates the development of several modifications to improve the pathwise coordinate optimization framework, which guarantees linear convergence to a unique sparse local optimum with optimal statistical properties (e.g. minimax optimality and oracle properties). This is the first result establishing the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions. Thorough numerical experiments are provided to back up our theory.
View on arXiv