Testing and Learning of Discrete Distributions

The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general metrics. The intuitions and results often contrast with the classic case. For , we can learn and test with a number of samples that is independent of the support size of the distribution: For , with an distance parameter , samples suffice for testing uniformity and samples suffice for learning, where is the conjugate of . These bounds are tight precisely when the support size of the distribution exceeds , which seems to act as an upper bound on the "apparent" support size. For some metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if . The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all . Another algorithm gives order-optimal sample complexity for uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all metrics. The author thanks Cl\'{e}ment Canonne for discussions and contributions to this work.
View on arXiv