ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.17028
13
5

Improved Learning-augmented Algorithms for k-means and k-medians Clustering

31 October 2022
Thy Nguyen
Anamay Chaturvedi
Huy Le Nguyen
ArXivPDFHTML
Abstract

We consider the problem of clustering in the learning-augmented setting, where we are given a data set in ddd-dimensional Euclidean space, and a label for each data point given by an oracle indicating what subsets of points should be clustered together. This setting captures situations where we have access to some auxiliary information about the data set relevant for our clustering objective, for instance the labels output by a neural network. Following prior work, we assume that there are at most an α∈(0,c)\alpha \in (0,c)α∈(0,c) for some c<1c<1c<1 fraction of false positives and false negatives in each predicted cluster, in the absence of which the labels would attain the optimal clustering cost OPT\mathrm{OPT}OPT. For a dataset of size mmm, we propose a deterministic kkk-means algorithm that produces centers with improved bound on clustering cost compared to the previous randomized algorithm while preserving the O(dmlog⁡m)O( d m \log m)O(dmlogm) runtime. Furthermore, our algorithm works even when the predictions are not very accurate, i.e. our bound holds for α\alphaα up to 1/21/21/2, an improvement over α\alphaα being at most 1/71/71/7 in the previous work. For the kkk-medians problem we improve upon prior work by achieving a biquadratic improvement in the dependence of the approximation factor on the accuracy parameter α\alphaα to get a cost of (1+O(α))OPT(1+O(\alpha))\mathrm{OPT}(1+O(α))OPT, while requiring essentially just O(mdlog⁡3m/α)O(md \log^3 m/\alpha)O(mdlog3m/α) runtime.

View on arXiv
Comments on this paper