63
0

DASH: A Distributed and Parallelizable Algorithm for Size-Constrained Submodular Maximization

Abstract

MapReduce (MR) frameworks for maximizing monotone, submodular functions subject to a cardinality constraint (SMCC) have currently only been shown to work with linear-adaptive (non-parallelizable) algorithms, that require large number of distributions in order to utilize the available processors, thus resulting in severe restrictions on the cardinality constraint in addition to limited scalability. Low-adaptive algorithms do not currently satisfy the requirements of these distributed MR frameworks, thereby limiting their performance. We study the SMCC problem in a distributed setting and propose the first MR algorithms with sublinear adaptive complexity. Our algorithms, R-DASH, T-DASH and G-DASH provide 0.316ε0.316-\varepsilon, 3/8ε3/8 -\varepsilon, and 11/eε1 - 1/e -\varepsilon approximation ratios, respectively, with nearly optimal adaptive complexity and nearly linear time complexity. Additionally, we provide a framework to increase, under some mild assumptions, the maximum permissible cardinality constraint from O(n/2)O( n / \ell^2) of prior MR algorithms to O(n/)O( n / \ell ), where nn is the data size and \ell is the number of machines; under a stronger condition on the objective function, we increase the maximum constraint value to nn. Finally, we provide empirical evidence to demonstrate that our sublinear-adaptive, distributed algorithms provide orders of magnitude faster runtime compared to current state-of-the-art distributed algorithms.

View on arXiv
Comments on this paper