ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.08646
54
16

Scalable Differentially Private Clustering via Hierarchically Separated Trees

17 June 2022
Vincent Cohen-Addad
Alessandro Epasto
Silvio Lattanzi
Vahab Mirrokni
Andrés Muñoz
David Saulpic
Chris Schwiegelshohn
Sergei Vassilvitskii
    FedML
ArXiv (abs)PDFHTML
Abstract

We study the private kkk-median and kkk-means clustering problem in ddd dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods. We prove that our method computes a solution with cost at most O(d3/2log⁡n)⋅OPT+O(kd2log⁡2n/ϵ2)O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / \epsilon^2)O(d3/2logn)⋅OPT+O(kd2log2n/ϵ2), where ϵ\epsilonϵ is the privacy guarantee. (The dimension term, ddd, can be replaced with O(log⁡k)O(\log k)O(logk) using standard dimension reduction techniques.) Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical, runs in near-linear, O~(nkd)\tilde{O}(nkd)O~(nkd), time and scales to tens of millions of points. We also show that our method is amenable to parallelization in large-scale distributed computing environments. In particular we show that our private algorithms can be implemented in logarithmic number of MPC rounds in the sublinear memory regime. Finally, we complement our theoretical analysis with an empirical evaluation demonstrating the algorithm's efficiency and accuracy in comparison to other privacy clustering baselines.

View on arXiv
Comments on this paper