262

Solving Empirical Risk Minimization in the Current Matrix Multiplication Time

Annual Conference Computational Learning Theory (COLT), 2019
Abstract

Many convex problems in machine learning and computer science share the same form: \begin{align*} \min_{x} \sum_{i} f_i( A_i x + b_i), \end{align*} where fif_i are convex functions on Rni\mathbb{R}^{n_i} with constant nin_i, AiRni×dA_i \in \mathbb{R}^{n_i \times d}, biRnib_i \in \mathbb{R}^{n_i} and ini=n\sum_i n_i = n. This problem generalizes linear programming and includes many problems in empirical risk minimization. In this paper, we give an algorithm that runs in time \begin{align*} O^* ( ( n^{\omega} + n^{2.5 - \alpha/2} + n^{2+ 1/6} ) \log (n / \delta) ) \end{align*} where ω\omega is the exponent of matrix multiplication, α\alpha is the dual exponent of matrix multiplication, and δ\delta is the relative accuracy. Note that the runtime has only a log dependence on the condition numbers or other data dependent parameters and these are captured in δ\delta. For the current bound ω2.38\omega \sim 2.38 [Vassilevska Williams'12, Le Gall'14] and α0.31\alpha \sim 0.31 [Le Gall, Urrutia'18], our runtime O(nωlog(n/δ))O^* ( n^{\omega} \log (n / \delta)) matches the current best for solving a dense least squares regression problem, a special case of the problem we consider. Very recently, [Alman'18] proved that all the current known techniques can not give a better ω\omega below 2.1682.168 which is larger than our 2+1/62+1/6. Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems. Our algorithm proposes two concepts which are different from [Cohen, Lee, Song'19] : \bullet We give a robust deterministic central path method, whereas the previous one is a stochastic central path which updates weights by a random sparse vector. \bullet We propose an efficient data-structure to maintain the central path of interior point methods even when the weights update vector is dense.

View on arXiv
Comments on this paper