26
0

Massively Parallel Maximum Coverage Revisited

Abstract

We study the maximum set coverage problem in the massively parallel model. In this setting, mm sets that are subsets of a universe of nn elements are distributed among mm machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than O~(n)\tilde{O}(n) memory. The objective is to find the kk sets whose coverage is maximized. We consider the regime where k=Ω(m)k = \Omega(m), m=O(n)m = O(n), and each machine has O~(n)\tilde{O}(n) memory. Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a 11/e1-1/e factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When k=Ω(m)k = \Omega(m), to obtain a 11/eϵ1-1/e-\epsilon approximation, previous work either requires O~(mn)\tilde{O}(mn) memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires 2O(1/ϵ)n2^{O(1/\epsilon)} n memory per machine which is prohibitively expensive even for a moderately small value ϵ\epsilon. Our result is a randomized (11/eϵ)(1-1/e-\epsilon)-approximation algorithm that uses O(1/ϵ3logm(log(1/ϵ)+logm))O(1/\epsilon^3 \cdot \log m \cdot (\log (1/\epsilon) + \log m)) rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.

View on arXiv
Comments on this paper