14
6

Differentially Private Online Submodular Maximization

Abstract

In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of TT submodular functions over a common finite ground set UU arrives online, and at each time-step the decision maker must choose at most kk elements of UU before observing the function. The decision maker obtains a payoff equal to the function evaluated on the chosen set, and aims to learn a sequence of sets that achieves low expected regret. In the full-information setting, we develop an (ε,δ)(\varepsilon,\delta)-DP algorithm with expected (11/e)(1-1/e)-regret bound of O(k2logUTlogk/δε)\mathcal{O}\left( \frac{k^2\log |U|\sqrt{T \log k/\delta}}{\varepsilon} \right). This algorithm contains kk ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an (ε,δ+O(eT1/3))(\varepsilon,\delta+ O(e^{-T^{1/3}}))-DP algorithm with expected (11/e)(1-1/e)-regret bound of O(logk/δε(k(UlogU)1/3)2T2/3)\mathcal{O}\left( \frac{\sqrt{\log k/\delta}}{\varepsilon} (k (|U| \log |U|)^{1/3})^2 T^{2/3} \right). Our algorithms contains kk ordered experts that learn the best marginal item to select given the items chosen her predecessors, while maintaining privacy of the functions. One challenge for privacy in this setting is that the payoff and feedback of expert ii depends on the actions taken by her i1i-1 predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest.

View on arXiv
Comments on this paper