11
3

Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes

Abstract

We investigate the complexity of training a two-layer ReLU neural network with weight decay regularization. Previous research has shown that the optimal solution of this problem can be found by solving a standard cone-constrained convex program. Using this convex formulation, we prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it. In particular, when ϵ84/8310.006\epsilon\leq\sqrt{84/83}-1\approx 0.006, we show that it is NP-hard to find an approximate global optimizer of the ReLU network objective with relative error ϵ\epsilon with respect to the objective value. Moreover, we develop a randomized algorithm which mirrors the Goemans-Williamson rounding of semidefinite Max-Cut relaxations. To provide polynomial-time approximations, we classify training datasets into three categories: (i) For orthogonal separable datasets, a precise solution can be obtained in polynomial-time. (ii) When there is a negative correlation between samples of different classes, we give a polynomial-time approximation with relative error π/210.253\sqrt{\pi/2}-1\approx 0.253. (iii) For general datasets, the degree to which the problem can be approximated in polynomial-time is governed by a geometric factor that controls the diameter of two zonotopes intrinsic to the dataset. To our knowledge, these results present the first polynomial-time approximation guarantees along with first hardness of approximation results for regularized ReLU networks.

View on arXiv
Comments on this paper