79
30

p\ell_p Testing and Learning of Discrete Distributions

Abstract

The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general p\ell_p metrics. The intuitions and results often contrast with the classic 1\ell_1 case. For p>1p > 1, we can learn and test with a number of samples that is independent of the support size of the distribution: For 1<p21 < p \leq 2, with an p\ell_p distance parameter ϵ\epsilon, O(1/ϵq)O(\sqrt{1/\epsilon^q}) samples suffice for testing uniformity and O(1/ϵq)O(1/\epsilon^q) samples suffice for learning, where q=p/(p1)q=p/(p-1) is the conjugate of pp. These bounds are tight precisely when the support size nn of the distribution exceeds 1/ϵq1/\epsilon^q, which seems to act as an upper bound on the "apparent" support size. For some p\ell_p metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if p>43p > \frac{4}{3}. The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all 1p21 \leq p \leq 2. Another algorithm gives order-optimal sample complexity for \ell_{\infty} uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all p\ell_p metrics. The author thanks Cl\'{e}ment Canonne for discussions and contributions to this work.

View on arXiv
Comments on this paper