ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.12293
11
8

Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

24 February 2021
Romain Couillet
Florent Chatelain
N. L. Bihan
ArXivPDFHTML
Abstract

The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly "puncturing" both the data matrix X∈Cp×nX\in\mathbb{C}^{p\times n}X∈Cp×n (or Rp×n\mathbb{R}^{p\times n}Rp×n) and its corresponding kernel (Gram) matrix KKK through Bernoulli masks: S∈{0,1}p×nS\in\{0,1\}^{p\times n}S∈{0,1}p×n for XXX and B∈{0,1}n×nB\in\{0,1\}^{n\times n}B∈{0,1}n×n for KKK. The resulting "two-way punctured" kernel is thus given by K=1p[(X⊙S)H(X⊙S)]⊙BK=\frac{1}{p}[(X \odot S)^{\sf H} (X \odot S)] \odot BK=p1​[(X⊙S)H(X⊙S)]⊙B. We demonstrate that, for XXX composed of independent columns drawn from a Gaussian mixture model, as n,p→∞n,p\to\inftyn,p→∞ with p/n→c0∈(0,∞)p/n\to c_0\in(0,\infty)p/n→c0​∈(0,∞), the spectral behavior of KKK -- its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors -- is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering of PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.

View on arXiv
Comments on this paper