Column Subset Selection via Polynomial Time Dual Volume Sampling
We study dual volume sampling, a method for selecting k columns from an n*m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume of the parallelepiped spanned by the rows of the induced submatrix. This method was studied in [3], who motivated it as a promising method for column subset selection. However, the development of polynomial time sampling algorithms -- exact or approximate -- has been since open. We close this open problem by presenting (i) an exact (randomized) polynomial time sampling algorithm; (ii) its derandomization that samples subsets satisfying the desired properties deterministically; and (iii) an efficient approximate sampling procedure using Markov chains that are provably fast mixing. Our algorithms can thus benefit downstream applications of dual volume sampling, such as column subset selection and experimental design.
View on arXiv