LOCO: Distributing Ridge Regression with Random Projections

13 June 2014

C. Heinze

Brian McWilliams

N. Meinshausen

Gabriel Krummenacher

Hastagiri P. Vanchinathan

ArXiv (abs)PDF HTML

Abstract

We propose LOCO, a distributed algorithm which solves large-scale ridge regression. LOCO randomly assigns variables to different processing units which do not communicate. Important dependencies between variables are preserved using random projections which are cheap to compute. We show that LOCO has bounded approximation error compared to the exact ridge regression solution in the fixed design setting. Experimentally, in addition to obtaining significant speedups LOCO achieves good predictive accuracy on a variety of large-scale regression problems. Notably LOCO is able to solve a regression problem with 5 billion non-zeros distributed across 128 workers in 25 seconds.

View on arXiv

Comments on this paper