DASH: A Distributed and Parallelizable Algorithm for Size-Constrained Submodular Maximization

MapReduce (MR) frameworks for maximizing monotone, submodular functions subject to a cardinality constraint (SMCC) have currently only been shown to work with linear-adaptive (non-parallelizable) algorithms, that require large number of distributions in order to utilize the available processors, thus resulting in severe restrictions on the cardinality constraint in addition to limited scalability. Low-adaptive algorithms do not currently satisfy the requirements of these distributed MR frameworks, thereby limiting their performance. We study the SMCC problem in a distributed setting and propose the first MR algorithms with sublinear adaptive complexity. Our algorithms, R-DASH, T-DASH and G-DASH provide , , and approximation ratios, respectively, with nearly optimal adaptive complexity and nearly linear time complexity. Additionally, we provide a framework to increase, under some mild assumptions, the maximum permissible cardinality constraint from of prior MR algorithms to , where is the data size and is the number of machines; under a stronger condition on the objective function, we increase the maximum constraint value to . Finally, we provide empirical evidence to demonstrate that our sublinear-adaptive, distributed algorithms provide orders of magnitude faster runtime compared to current state-of-the-art distributed algorithms.
View on arXiv