17
7

A faster and simpler algorithm for learning shallow networks

Abstract

We revisit the well-studied problem of learning a linear combination of kk ReLU activations given labeled examples drawn from the standard dd-dimensional Gaussian measure. Chen et al. [CDG+23] recently gave the first algorithm for this problem to run in poly(d,1/ε)\text{poly}(d,1/\varepsilon) time when k=O(1)k = O(1), where ε\varepsilon is the target error. More precisely, their algorithm runs in time (d/ε)quasipoly(k)(d/\varepsilon)^{\mathrm{quasipoly}(k)} and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only (d/ε)O(k2)(d/\varepsilon)^{O(k^2)}.

View on arXiv
Comments on this paper