ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.07848
29
5
v1v2v3 (latest)

Fully Scalable MPC Algorithms for Clustering in High Dimension

15 July 2023
A. Czumaj
Guichen Gao
S. Jiang
Robert Krauthgamer
P. Veselý
ArXiv (abs)PDFHTML
Abstract

We design new algorithms for kkk-clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine is nσn^{\sigma}nσ for arbitrarily small fixed σ>0\sigma>0σ>0. Importantly, the local memory may be substantially smaller than kkk. Our algorithms take O(1)O(1)O(1) rounds and achieve O(1)O(1)O(1)-bicriteria approximation for kkk-Median and for kkk-Means, namely, they compute (1+ε)k(1+\varepsilon)k(1+ε)k clusters of cost within O(1/ε2)O(1/\varepsilon^2)O(1/ε2)-factor of the optimum. Previous work achieves only poly(log⁡n)\mathrm{poly}(\log n)poly(logn)-bicriteria approximation [Bhaskara et al., ICML'18], or handles a special case [Cohen-Addad et al., ICML'22]. Our results rely on an MPC algorithm for O(1)O(1)O(1)-approximation of facility location in O(1)O(1)O(1) rounds. A primary technical tool that we develop, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing certain statistics on an approximate neighborhood of every data point, which includes range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22].

View on arXiv
Comments on this paper