A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise

25 December 2013

Abstract

Mixture models represent the superposition of statistical processes, and are natural in machine learning and statistics. Despite the prevalence and importance of mixture models, little is known in the realm of efficient algorithms with strong statistical guarantees. The Expectation-Minimization (EM) approach is perhaps the most popular, yet save for the Gaussian mixture clustering problem, few guarantees exist. For mixed regression in the noiseless setting, in particular, only recently has an EM-like algorithm been able to provide near-tight guarantees. In the noisy setting, tensor approaches have recently succeeded in providing a tractable and statistically consistent approach, but require a sample complexity much higher than the information-theoretic limits. We consider the mixed regression problem with two components, under adversarial and stochastic noise. Using a lifting to a higher dimensional space, we give a convex optimization formulation that provably recovers the true solution, with near-optimal sample complexity and error bounds. In the noiseless setting, our results remove the extra log factor from recent (and best-to-date) work, thus achieving exact recovery with order-wise as many samples as the degrees of freedom. In the general setting with noise, our results represent the first (and currently only known) tractable algorithm guaranteeing successful recovery with tight bounds on sample complexity and recovery errors.

View on arXiv

Comments on this paper