ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1506.00671
117
91

Sample-Optimal Density Estimation in Nearly-Linear Time

1 June 2015
Jayadev Acharya
Ilias Diakonikolas
Jingkai Li
Ludwig Schmidt
ArXiv (abs)PDFHTML
Abstract

We design a new, fast algorithm for agnostically learning univariate probability distributions whose densities are well approximated by piecewise polynomial functions. Let fff be the density function of an arbitrary univariate distribution, and suppose that fff is OPT\mathrm{OPT}OPT-close in L1L_1L1​-distance to an unknown piecewise polynomial function with ttt interval pieces and degree ddd. Our algorithm draws n=O(t(d+1)/ϵ2)n = O(t(d+1)/\epsilon^2)n=O(t(d+1)/ϵ2) samples from fff, runs in time O~(n⋅poly(d))\tilde{O}(n \cdot \mathrm{poly}(d))O~(n⋅poly(d)), and with probability at least 9/109/109/10 outputs an O(t)O(t)O(t)-piecewise degree-ddd hypothesis hhh that is 4⋅OPT+ϵ4 \cdot \mathrm{OPT} +\epsilon4⋅OPT+ϵ close to fff. Our general algorithm yields (nearly) sample-optimal and nearly-linear time estimators for a wide range of structured distribution families over both continuous and discrete domains in a unified way. For most of our applications, these are the first sample-optimal and nearly-linear time estimators in the literature. As a consequence, our work resolves the sample and computational complexities of a broad class of inference tasks via a single "meta-algorithm". Moreover, we experimentally demonstrate that our algorithm performs very well in practice. Our algorithm consists of three "levels": (i) At the top level, we employ an iterative greedy algorithm for finding a good partition of the real line into the pieces of a piecewise polynomial. (ii) For each piece, we show that the sub-problem of finding a good polynomial fit on the current interval can be solved efficiently with a separation oracle method. (iii) We reduce the task of finding a separating hyperplane to a combinatorial problem and give an efficient algorithm for this problem. Combining these three procedures gives a density estimation algorithm with the claimed guarantees.

View on arXiv
Comments on this paper