Solving Empirical Risk Minimization in the Current Matrix Multiplication Time

Many convex problems in machine learning and computer science share the same form: \begin{align*} \min_{x} \sum_{i} f_i( A_i x + b_i), \end{align*} where are convex functions on with constant , , and . This problem generalizes linear programming and includes many problems in empirical risk minimization. In this paper, we give an algorithm that runs in time \begin{align*} O^* ( ( n^{\omega} + n^{2.5 - \alpha/2} + n^{2+ 1/6} ) \log (n / \delta) ) \end{align*} where is the exponent of matrix multiplication, is the dual exponent of matrix multiplication, and is the relative accuracy. Note that the runtime has only a log dependence on the condition numbers or other data dependent parameters and these are captured in . For the current bound [Vassilevska Williams'12, Le Gall'14] and [Le Gall, Urrutia'18], our runtime matches the current best for solving a dense least squares regression problem, a special case of the problem we consider. Very recently, [Alman'18] proved that all the current known techniques can not give a better below which is larger than our . Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems. Our algorithm proposes two concepts which are different from [Cohen, Lee, Song'19] : We give a robust deterministic central path method, whereas the previous one is a stochastic central path which updates weights by a random sparse vector. We propose an efficient data-structure to maintain the central path of interior point methods even when the weights update vector is dense.
View on arXiv