274
v1v2v3 (latest)

GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility

Main:9 Pages
4 Figures
Bibliography:4 Pages
1 Tables
Appendix:8 Pages
Abstract

This work studies a novel subset selection problem called max-min diversification with monotone submodular utility (MDMS\textsf{MDMS}), which has a wide range of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of MDMS\textsf{MDMS} is to maximize f(S)=g(S)+λdiv(S)f(S) = g(S) + \lambda \cdot \texttt{div}(S) subject to a cardinality constraint Sk|S| \le k, where g(S)g(S) is a monotone submodular function and div(S)=minu,vS:uvdist(u,v)\texttt{div}(S) = \min_{u,v \in S : u \ne v} \text{dist}(u,v) is the max-min diversity objective. We propose the GIST\texttt{GIST} algorithm, which gives a 12\frac{1}{2}-approximation guarantee for MDMS\textsf{MDMS} by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate within a factor of 0.55840.5584. Finally, we show in our empirical study that GIST\texttt{GIST} outperforms state-of-the-art benchmarks for a single-shot data sampling task on ImageNet.

View on arXiv
Comments on this paper