ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.01115
72
0
v1v2v3 (latest)

Relax and Merge: A Simple Yet Effective Framework for Solving Fair kkk-Means and kkk-sparse Wasserstein Barycenter Problems

2 November 2024
Shihong Song
Guanlin Mo
Qingyuan Yang
Hu Ding
ArXiv (abs)PDFHTML
Abstract

The fairness of clustering algorithms has gained widespread attention across various areas, including machine learning, In this paper, we study fair kkk-means clustering in Euclidean space. Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group within specified lower and upper bounds. Due to these fairness constraints, determining the optimal locations of kkk centers is a quite challenging task. We propose a novel ``Relax and Merge'' framework that returns a (1+4ρ+O(ϵ))(1+4\rho + O(\epsilon))(1+4ρ+O(ϵ))-approximate solution, where ρ\rhoρ is the approximate ratio of an off-the-shelf vanilla kkk-means algorithm and O(ϵ)O(\epsilon)O(ϵ) can be an arbitrarily small positive number. If equipped with a PTAS of kkk-means, our solution can achieve an approximation ratio of (5+O(ϵ))(5+O(\epsilon))(5+O(ϵ)) with only a slight violation of the fairness constraints, which improves the current state-of-the-art approximation guarantee. Furthermore, using our framework, we can also obtain a (1+4ρ+O(ϵ))(1+4\rho +O(\epsilon))(1+4ρ+O(ϵ))-approximate solution for the kkk-sparse Wasserstein Barycenter problem, which is a fundamental optimization problem in the field of optimal transport, and a (2+6ρ)(2+6\rho)(2+6ρ)-approximate solution for the strictly fair kkk-means clustering with no violation, both of which are better than the current state-of-the-art methods. In addition, the empirical results demonstrate that our proposed algorithm can significantly outperform baseline approaches in terms of clustering cost.

View on arXiv
Comments on this paper