ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.08113
17
34

Learning Discrete Distributions from Untrusted Batches

22 November 2017
Mingda Qiao
Gregory Valiant
    FedML
ArXivPDFHTML
Abstract

We consider the problem of learning a discrete distribution in the presence of an ϵ\epsilonϵ fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, ppp, and each data source provides a batch of ≥k\ge k≥k samples, with the guarantee that at least a (1−ϵ)(1-\epsilon)(1−ϵ) fraction of the sources draw their samples from a distribution with total variation distance at most η\etaη from ppp. We make no assumptions on the data provided by the remaining ϵ\epsilonϵ fraction of sources--this data can even be chosen as an adversarial function of the (1−ϵ)(1-\epsilon)(1−ϵ) fraction of "good" batches. We provide two algorithms: one with runtime exponential in the support size, nnn, but polynomial in kkk, 1/ϵ1/\epsilon1/ϵ and 1/η1/\eta1/η that takes O((n+k)/ϵ2)O((n+k)/\epsilon^2)O((n+k)/ϵ2) batches and recovers ppp to error O(η+ϵ/k)O(\eta+\epsilon/\sqrt{k})O(η+ϵ/k​). This recovery accuracy is information theoretically optimal, to constant factors, even given an infinite number of data sources. Our second algorithm applies to the η=0\eta = 0η=0 setting and also achieves an O(ϵ/k)O(\epsilon/\sqrt{k})O(ϵ/k​) recover guarantee, though it runs in poly((nk)k)\mathrm{poly}((nk)^k)poly((nk)k) time. This second algorithm, which approximates a certain tensor via a rank-1 tensor minimizing ℓ1\ell_1ℓ1​ distance, is surprising in light of the hardness of many low-rank tensor approximation problems, and may be of independent interest.

View on arXiv
Comments on this paper