253

Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

IEEE Annual Symposium on Foundations of Computer Science (FOCS), 2020
Abstract

Let VV be any vector space of multivariate degree-dd homogeneous polynomials with co-dimension at most kk, and SS be the set of points where all polynomials in VV {\em nearly} vanish. We establish a qualitatively optimal upper bound on the size of ϵ\epsilon-covers for SS, in the 2\ell_2-norm. Roughly speaking, we show that there exists an ϵ\epsilon-cover for SS of cardinality M=(k/ϵ)Od(k1/d)M = (k/\epsilon)^{O_d(k^{1/d})}. Our result is constructive yielding an algorithm to compute such an ϵ\epsilon-cover that runs in time poly(M)\mathrm{poly}(M). Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for kk-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with kk hidden units (under the Gaussian distribution), density and parameter estimation for kk-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for kk-mixtures of hyperplanes. Our algorithms run in time {\em quasi-polynomial} in the parameter kk. Previous algorithms for these problems had running times exponential in kΩ(1)k^{\Omega(1)}. At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.

View on arXiv
Comments on this paper