ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.01118
139
6
v1v2 (latest)

Adapting kkk-means algorithms for outliers

2 July 2020
Christoph Grunau
Václav Rozhon
ArXiv (abs)PDFHTML
Abstract

This paper shows how to adapt several simple and classical sampling-based algorithms for the kkk-means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical kkk-means++ algorithm to the setting with outliers. However, their algorithm needs to output O(log⁡(k)⋅z)O(\log (k) \cdot z)O(log(k)⋅z) outliers, where zzz is the number of true outliers, to match the O(log⁡k)O(\log k)O(logk)-approximation guarantee of kkk-means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed kkk-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output (1+ε)z(1+\varepsilon)z(1+ε)z outliers while achieving an O(1/ε)O(1 / \varepsilon)O(1/ε)-approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular kkk-means∥\|∥ of Bahmani et al. (PVLDB 2012). A theoretical application of our techniques is an algorithm with running time O~(nk2/z)\tilde{O}(nk^2/z)O~(nk2/z) that achieves an O(1)O(1)O(1)-approximation to the objective function while outputting O(z)O(z)O(z) outliers, assuming k≪z≪nk \ll z \ll nk≪z≪n. This is complemented with a matching lower bound of Ω(nk2/z)\Omega(nk^2/z)Ω(nk2/z) for this problem in the oracle model.

View on arXiv
Comments on this paper