GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility
This work studies a novel subset selection problem called max-min diversification with monotone submodular utility (), which has a wide range of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of is to maximize subject to a cardinality constraint , where is a monotone submodular function and is the max-min diversity objective. We propose the algorithm, which gives a -approximation guarantee for by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate within a factor of . Finally, we show in our empirical study that outperforms state-of-the-art benchmarks for a single-shot data sampling task on ImageNet.
View on arXiv