19
115

Solving Empirical Risk Minimization in the Current Matrix Multiplication Time

Abstract

Many convex problems in machine learning and computer science share the same form: \begin{align*} \min_{x} \sum_{i} f_i( A_i x + b_i), \end{align*} where fif_i are convex functions on Rni\mathbb{R}^{n_i} with constant nin_i, AiRni×dA_i \in \mathbb{R}^{n_i \times d}, biRnib_i \in \mathbb{R}^{n_i} and ini=n\sum_i n_i = n. This problem generalizes linear programming and includes many problems in empirical risk minimization. In this paper, we give an algorithm that runs in time \begin{align*} O^* ( ( n^{\omega} + n^{2.5 - \alpha/2} + n^{2+ 1/6} ) \log (n / \delta) ) \end{align*} where ω\omega is the exponent of matrix multiplication, α\alpha is the dual exponent of matrix multiplication, and δ\delta is the relative accuracy. Note that the runtime has only a log dependence on the condition numbers or other data dependent parameters and these are captured in δ\delta. For the current bound ω2.38\omega \sim 2.38 [Vassilevska Williams'12, Le Gall'14] and α0.31\alpha \sim 0.31 [Le Gall, Urrutia'18], our runtime O(nωlog(n/δ))O^* ( n^{\omega} \log (n / \delta)) matches the current best for solving a dense least squares regression problem, a special case of the problem we consider. Very recently, [Alman'18] proved that all the current known techniques can not give a better ω\omega below 2.1682.168 which is larger than our 2+1/62+1/6. Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems. Our algorithm proposes two concepts which are different from [Cohen, Lee, Song'19] : \bullet We give a robust deterministic central path method, whereas the previous one is a stochastic central path which updates weights by a random sparse vector. \bullet We propose an efficient data-structure to maintain the central path of interior point methods even when the weights update vector is dense.

View on arXiv
Comments on this paper