ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.01958
50
18
v1v2v3v4v5 (latest)

Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations

3 July 2020
D. Donoho
A. Kipnis
ArXiv (abs)PDFHTML
Abstract

Given two samples from possibly different discrete distributions over a common set of size NNN, consider the problem of testing whether these distributions are identical, vs. the following rare/weak perturbation alternative: the frequencies of N1−βN^{1-\beta}N1−β elements are perturbed by r(log⁡N)/2nr(\log N)/2nr(logN)/2n in the Hellinger distance, where nnn is the size of each sample. We adapt the Higher Criticism (HC) test to this setting using P-values obtained from NNN exact binomial tests. We characterize the asymptotic performance of the HC-based test in terms of the sparsity parameter β\betaβ and the perturbation intensity parameter rrr. Specifically, we derive a region in the (β,r)(\beta,r)(β,r)-plane where the test asymptotically has maximal power, while having asymptotically no power outside this region. Our analysis distinguishes between the cases of dense (N≫nN\gg nN≫n) and sparse (N≪nN\ll nN≪n) contingency tables. In the dense case, the phase transition curve matches that of an analogous two-sample normal means model.

View on arXiv
Comments on this paper