168
72

The Complexity of Estimating Rényi Entropy

Abstract

It was recently shown that estimating the Shannon entropy H(p)H(p) of a discrete kk-symbol distribution pp requires Θ(k/logk)\Theta(k/\log k) samples, a number that grows near-linearly in the support size. In many applications H(p)H(p) can be replaced by the more general R\'enyi entropy of order α\alpha, Hα(p)H_\alpha(p). We determine the number of samples needed to estimate Hα(p)H_\alpha(p). for all α\alpha, showing that α<1\alpha < 1 requires a super-linear, roughly k1/αk^{1/\alpha} samples, noninteger α>1\alpha>1 requires a near-linear kk samples, but, perhaps surprisingly, integer α>1\alpha>1 requires only Θ(k11/α)\Theta(k^{1-1/\alpha}) samples. In particular, estimating H2(p)H_2(p), which arises in security, DNA reconstruction, closeness testing, and other applications, requires only Θ(k)\Theta(\sqrt{k}) samples. The estimators achieving these bounds are simple and run in time linear in the number of samples.

View on arXiv
Comments on this paper