$\ell_p$ Testing and Learning of Discrete Distributions

7 December 2014

Abstract

The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general $\ell_p$ metrics. The intuitions and results often contrast with the classic $\ell_1$ case. For $p > 1$ , we can learn and test with a number of samples that is independent of the support size of the distribution: For $1 < p \leq 2$ , with an $\ell_p$ distance parameter $\epsilon$ , $O(\sqrt{1/\epsilon^q})$ samples suffice for testing uniformity and $O(1/\epsilon^q)$ samples suffice for learning, where $q=p/(p-1)$ is the conjugate of $p$ . These bounds are tight precisely when the support size $n$ of the distribution exceeds $1/\epsilon^q$ , which seems to act as an upper bound on the "apparent" support size. For some $\ell_p$ metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if $p > \frac{4}{3}$ . The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all $1 \leq p \leq 2$ . Another algorithm gives order-optimal sample complexity for $\ell_{\infty}$ uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all $\ell_p$ metrics. The author thanks Cl\'{e}ment Canonne for discussions and contributions to this work.

View on arXiv

Comments on this paper

ℓp\ell_pℓp​ Testing and Learning of Discrete Distributions

$\ell_p$ Testing and Learning of Discrete Distributions