219
v1v2v3v4v5v6 (latest)

Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting

ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2022
Abstract

State-of-the-art parallel sorting algorithms for distributed-memory architectures are based on computing a balanced partitioning via sampling and histogramming. By finding samples that partition the sorted keys into evenly-sized chunks, these algorithms minimize the number of communication rounds required. Histogramming (computing positions of samples) guides sampling, enabling a decrease in the overall number of samples collected. We derive lower and upper bounds on the number of sampling/histogramming rounds required to compute a balanced partitioning. We improve on prior results to demonstrate that when using pp processors, O(logp)O(\log^* p) rounds with O(p/logp)O(p/\log^* p) samples per round suffice. We match that with a lower bound that shows that any algorithm with O(p)O(p) samples per round requires at least Ω(logp)\Omega(\log^* p) rounds. Additionally, we prove the Ω(plogp)\Omega(p \log p) samples lower bound for one round, thus proving that existing one round algorithms: sample sort, AMS sort and HSS have optimal sample size complexity. To derive the lower bound, we propose a hard randomized input distribution and apply classical results from the distribution theory of runs.

View on arXiv
Comments on this paper