ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1410.6801
32
355

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

24 October 2014
Michael B. Cohen
Sam Elder
Cameron Musco
Christopher Musco
Madalina Persu
ArXivPDFHTML
Abstract

We show how to approximate a data matrix A\mathbf{A}A with a much smaller sketch A~\mathbf{\tilde A}A~ that can be used to solve a general class of constrained k-rank approximation problems to within (1+ϵ)(1+\epsilon)(1+ϵ) error. Importantly, this class of problems includes kkk-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k)O(k)O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For kkk-means dimensionality reduction, we provide (1+ϵ)(1+\epsilon)(1+ϵ) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bvA\bv{A}\bvA, but can be used directly to compute this subspace. Finally, for kkk-means clustering, we show how to achieve a (9+ϵ)(9+\epsilon)(9+ϵ) approximation by Johnson-Lindenstrauss projecting data points to just O(log⁡k/ϵ2)O(\log k/\epsilon^2)O(logk/ϵ2) dimensions. This gives the first result that leverages the specific structure of kkk-means to achieve dimension independent of input size and sublinear in kkk.

View on arXiv
Comments on this paper