ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.05766
67
5
v1v2 (latest)

Likelihood estimation of sparse topic distributions in topic models and its applications to Wasserstein document distance calculations

12 July 2021
Xin Bing
F. Bunea
Seth Strimas-Mackey
M. Wegkamp
ArXiv (abs)PDFHTML
Abstract

This paper studies the estimation of high-dimensional, discrete, possibly sparse, mixture models in topic models. The data consists of observed multinomial counts of ppp words across nnn independent documents. In topic models, the p×np\times np×n expected word frequency matrix is assumed to be factorized as a p×Kp\times Kp×K word-topic matrix AAA and a K×nK\times nK×n topic-document matrix TTT. Since columns of both matrices represent conditional probabilities belonging to probability simplices, columns of AAA are viewed as ppp-dimensional mixture components that are common to all documents while columns of TTT are viewed as the KKK-dimensional mixture weights that are document specific and are allowed to be sparse. The main interest is to provide sharp, finite sample, ℓ1\ell_1ℓ1​-norm convergence rates for estimators of the mixture weights TTT when AAA is either known or unknown. For known AAA, we suggest MLE estimation of TTT. Our non-standard analysis of the MLE not only establishes its ℓ1\ell_1ℓ1​ convergence rate, but reveals a remarkable property: the MLE, with no extra regularization, can be exactly sparse and contain the true zero pattern of TTT. We further show that the MLE is both minimax optimal and adaptive to the unknown sparsity in a large class of sparse topic distributions. When AAA is unknown, we estimate TTT by optimizing the likelihood function corresponding to a plug in, generic, estimator A^\hat{A}A^ of AAA. For any estimator A^\hat{A}A^ that satisfies carefully detailed conditions for proximity to AAA, the resulting estimator of TTT is shown to retain the properties established for the MLE. The ambient dimensions KKK and ppp are allowed to grow with the sample sizes. Our application is to the estimation of 1-Wasserstein distances between document generating distributions. We propose, estimate and analyze new 1-Wasserstein distances between two probabilistic document representations.

View on arXiv
Comments on this paper