100
48

Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design

Abstract

We study the AA-optimal design problem where we are given vectors v1,,vnRdv_1,\ldots,v_n\in\mathbb{R}^d, an integer kdk\geq d, and the goal is to select a set SS of kk vectors that minimizes the trace of (iSvivi)1(\sum_{i\in S}v_iv_i^\top)^{-1}. Traditionally, the problem is an instance of optimal design of experiments in statistics where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick kk of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks, sparse least squares regression, feature selection for kk-means clustering, and matrix approximation. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for AA-optimal design. Given a matrix, proportional volume sampling picks a set of columns SS of size kk with probability proportional to μ(S)\mu(S) times det(iSvivi)\det(\sum_{i\in S}v_iv_i^\top) for some measure μ\mu. Our main result is to show the approximability of the AA-optimal design problem can be reduced to approximate independence properties of the measure μ\mu. We appeal to hard-core distributions as candidate distributions μ\mu that allow us to obtain improved approximation algorithms for the AA-optimal design. Our results include a dd-approximation when k=dk=d, an (1+ϵ)(1+\epsilon)-approximation when k=Ω(dϵ+1ϵ2log1ϵ)k=\Omega\left(\frac{d}{\epsilon}+\frac{1}{\epsilon^2}\log\frac{1}{\epsilon}\right) and kkd+1\frac{k}{k-d+1}-approximation when repetitions of vectors are allowed in the solution. We consider generalization of the problem for kdk\leq d and obtain a kk-approximation. The last result implies a restricted invertibility principle for the harmonic mean of singular values. We also show that the problem is NP\mathsf{NP}-hard to approximate within a fixed constant when k=dk=d.

View on arXiv
Comments on this paper