We study sample-efficient distribution learning, where a learner is given an iid sample from an unknown target distribution, and aims to approximate that distribution. Assuming the target distribution can be approximated by a member of some predetermined class of distributions, we analyze how large should a sample be, in order to be able to find a distribution that is close to the target in total variation distance. In this work, we introduce a novel method for distribution learning via a form of "compression." Having a large enough sample from a target distribution, can one compress that sample set, by picking only a few instances from it, in a way that allows recovery of (an approximation to) the target distribution from the compressed set? We prove that if this is the case for all members of a class of distributions, then there is a sample-efficient way of distribution learning for this class. As an application of our approach, we provide a sample-efficient method for agnostic distribution learning with respect to the class of mixtures of axis-aligned Gaussian distributions over . This method uses only samples (to guarantee with high probability an error of at most ). This is the first sample complexity upper bound that is tight in , , and , up to logarithmic factors. Along the way, we prove several properties of compression schemes. Namely, we prove that if there is a compression scheme for a base class of distributions, then there is a compression scheme for the class of mixtures, as well as the products of that base class. These closure properties make compression schemes a powerful tool. For example, the problem of learning mixtures of axis-aligned Gaussians reduces to that of "robustly" compressing one-dimensional Gaussians, which we show is possible using a compressed set of constant size.
View on arXiv