There has recently been significant interest in the machine learning community on understanding and using submodular functions. Despite this recent interest, little is known about submodular functions from a learning theory perspective. Motivated by applications such as pricing goods in economics, this paper considers PAC-style learning of submodular functions in a distributional setting. A problem instance consists of a distribution on {0,1}^n and a real-valued function on {0,1}^n that is non-negative, monotone and submodular. We are given poly(n) samples from this distribution, along with the values of the function at those sample points. The task is to approximate the value of the function to within a multiplicative factor at subsequent sample points drawn from the same distribution, with sufficiently high probability. We prove several results for this problem. (1) If the function is Lipschitz and the distribution is a product distribution, such as the uniform distribution, then a good approximation is possible: there is an algorithm that approximates the function to within a factor O(log(1/epsilon)) on a set of measure 1-epsilon, for any epsilon > 0. (2) If we do not assume that the distribution is a product distribution, then the approximation factor must be much worse: no algorithm can approximate the function to within a factor of O~(n^{1/3}) on a set of measure 1/2+epsilon, for any constant epsilon > 0. (3) On the other hand, this negative result is nearly tight: for an arbitrary distribution, there is an algorithm that approximations the function to within a factor sqrt(n) on a set of measure 1-epsilon. Our work combines central issues in optimization (submodular functions and matroids) with central topics in learning (distributional learning and PAC-style analyses) and with central concepts in pseudo-randomness (lossless expander graphs).
View on arXiv