ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.04144
26
22

Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu

9 November 2020
Arnab Bhattacharyya
Sutanu Gayen
Eric Price
N. V. Vinodchandran
ArXivPDFHTML
Abstract

We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution PPP on Σn\Sigma^nΣn and a tree TTT on nnn nodes, we say TTT is an ε\varepsilonε-approximate tree for PPP if there is a TTT-structured distribution QQQ such that D(P  ∣∣  Q)D(P\;||\;Q)D(P∣∣Q) is at most ε\varepsilonε more than the best possible tree-structured distribution for PPP. We show that if PPP itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with O~(∣Σ∣3nε−1)\widetilde{O}(|\Sigma|^3 n\varepsilon^{-1})O(∣Σ∣3nε−1) i.i.d.~samples outputs an ε\varepsilonε-approximate tree for PPP with constant probability. In contrast, for a general PPP (which may not be tree-structured), Ω(n2ε−2)\Omega(n^2\varepsilon^{-2})Ω(n2ε−2) samples are necessary to find an ε\varepsilonε-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart~(STOC, 2018): we prove that for three random variables X,Y,ZX,Y,ZX,Y,Z each over Σ\SigmaΣ, testing if I(X;Y∣Z)I(X; Y \mid Z)I(X;Y∣Z) is 000 or ≥ε\geq \varepsilon≥ε is possible with O~(∣Σ∣3/ε)\widetilde{O}(|\Sigma|^3/\varepsilon)O(∣Σ∣3/ε) samples. Finally, we show that for a specific tree TTT, with O~(∣Σ∣2nε−1)\widetilde{O} (|\Sigma|^2n\varepsilon^{-1})O(∣Σ∣2nε−1) samples from a distribution PPP over Σn\Sigma^nΣn, one can efficiently learn the closest TTT-structured distribution in KL divergence by applying the add-1 estimator at each node.

View on arXiv
Comments on this paper