LOCO: Distributing Ridge Regression with Random Projections
Abstract
We propose LOCO, a distributed algorithm which solves large-scale ridge regression. LOCO randomly assigns variables to different processing units which do not communicate. Important dependencies between variables are preserved using random projections which are cheap to compute. We show that LOCO has bounded approximation error compared to the exact ridge regression solution in the fixed design setting. Experimentally, in addition to obtaining significant speedups LOCO achieves good predictive accuracy on a variety of large-scale regression problems. Notably LOCO is able to solve a regression problem with 5 billion non-zeros distributed across 128 workers in 25 seconds.
View on arXivComments on this paper
