6
11

Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints

Abstract

We consider the problem of estimating a dd-dimensional ss-sparse discrete distribution from its samples observed under a bb-bit communication constraint. The best-known previous result on 2\ell_2 estimation error for this problem is O(slog(d/s)n2b)O\left( \frac{s\log\left( {d}/{s}\right)}{n2^b}\right). Surprisingly, we show that when sample size nn exceeds a minimum threshold n(s,d,b)n^*(s, d, b), we can achieve an 2\ell_2 estimation error of O(sn2b)O\left( \frac{s}{n2^b}\right). This implies that when n>n(s,d,b)n>n^*(s, d, b) the convergence rate does not depend on the ambient dimension dd and is the same as knowing the support of the distribution beforehand. We next ask the question: ``what is the minimum n(s,d,b)n^*(s, d, b) that allows dimension-free convergence?''. To upper bound n(s,d,b)n^*(s, d, b), we develop novel localization schemes to accurately and efficiently localize the unknown support. For the non-interactive setting, we show that n(s,d,b)=O(min(d2log2d/2b,s4log2d/2b))n^*(s, d, b) = O\left( \min \left( {d^2\log^2 d}/{2^b}, {s^4\log^2 d}/{2^b}\right) \right). Moreover, we connect the problem with non-adaptive group testing and obtain a polynomial-time estimation scheme when n=Ω~(s4log4d/2b)n = \tilde{\Omega}\left({s^4\log^4 d}/{2^b}\right). This group testing based scheme is adaptive to the sparsity parameter ss, and hence can be applied without knowing it. For the interactive setting, we propose a novel tree-based estimation scheme and show that the minimum sample-size needed to achieve dimension-free convergence can be further reduced to n(s,d,b)=O~(s2log2d/2b)n^*(s, d, b) = \tilde{O}\left( {s^2\log^2 d}/{2^b} \right).

View on arXiv
Comments on this paper