282
v1v2 (latest)

The Sample Complexity of Simple Binary Hypothesis Testing

Main:52 Pages
Bibliography:1 Pages
Appendix:9 Pages
Abstract

The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d.\ samples required to distinguish between two distributions pp and qq in either: (i) the prior-free setting, with type-I error at most α\alpha and type-II error at most β\beta; or (ii) the Bayesian setting, with Bayes error at most δ\delta and prior distribution (π,1π)(\pi, 1-\pi). This problem has only been studied when α=β\alpha = \beta (prior-free) or π=1/2\pi = 1/2 (Bayesian), and the sample complexity is known to be characterized by the Hellinger divergence between pp and qq, up to multiplicative constants. In this paper, we derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of pp, qq, and all error parameters) for: (i) all 0α,β1/80 \le \alpha, \beta \le 1/8 in the prior-free setting; and (ii) all δπ/4\delta \le \pi/4 in the Bayesian setting. In particular, the formula admits equivalent expressions in terms of certain divergences from the Jensen--Shannon and Hellinger families. The main technical result concerns an ff-divergence inequality between members of the Jensen--Shannon and Hellinger families, which is proved by a combination of information-theoretic tools and case-by-case analyses. We explore applications of our results to (i) robust hypothesis testing, (ii) distributed (locally-private and communication-constrained) hypothesis testing, (iii) sequential hypothesis testing, and (iv) hypothesis testing with erasures.

View on arXiv
Comments on this paper