189

Guessing Efficiently for Constrained Subspace Approximation

International Colloquium on Automata, Languages and Programming (ICALP), 2025
Main:26 Pages
Bibliography:5 Pages
1 Tables
Abstract

In this paper we study constrained subspace approximation problem. Given a set of nn points {a1,,an}\{a_1,\ldots,a_n\} in Rd\mathbb{R}^d, the goal of the {\em subspace approximation} problem is to find a kk dimensional subspace that best approximates the input points. More precisely, for a given p1p\geq 1, we aim to minimize the ppth power of the p\ell_p norm of the error vector (a1Pa1,,anPan)(\|a_1-\bm{P}a_1\|,\ldots,\|a_n-\bm{P}a_n\|), where P\bm{P} denotes the projection matrix onto the subspace and the norms are Euclidean. In \emph{constrained} subspace approximation (CSA), we additionally have constraints on the projection matrix P\bm{P}. In its most general form, we require P\bm{P} to belong to a given subset S\mathcal{S} that is described explicitly or implicitly.We introduce a general framework for constrained subspace approximation. Our approach, that we term coreset-guess-solve, yields either (1+ε)(1+\varepsilon)-multiplicative or ε\varepsilon-additive approximations for a variety of constraints. We show that it provides new algorithms for partition-constrained subspace approximation with applications to {\it fair} subspace approximation, kk-means clustering, and projected non-negative matrix factorization, among others. Specifically, while we reconstruct the best known bounds for kk-means clustering in Euclidean spaces, we improve the known results for the remainder of the problems.

View on arXiv
Comments on this paper