Testing and Learning of Discrete Distributions

The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general metrics. The intuitions and results often contrast with the classic case. For , we can learn and test with a number of samples that is independent of the support size of the distribution: With an tolerance , samples suffice for testing uniformity and samples suffice for learning, where is the conjugate of . As this parallels the intuition that and samples suffice for the case, it seems that acts as an upper bound on the "apparent" support size. For some metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if . The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all . Another algorithm gives order-optimal sample complexity for uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all metrics. The author thanks Cl\'{e}ment Canonne for discussions and contributions to this work.
View on arXiv