56
v1v2 (latest)

(Poly)Logarithmic Time Construction of Round-optimal nn-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI

Abstract

We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of nn distinct blocks of data from a root processor to all other processors in 11-ported, pp-processor networks with fully bidirectional communication. For any pp and nn, broadcasting in this model requires n1+log2pn-1+\lceil\log_2 p\rceil communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes O(log3p)O(\log^3 p) time steps per processor, each of which can compute its part of the schedule independently of the other processors in O(logp)O(\log p) space. The result is a significant improvement over the sequential O(plog2p)O(p \log^2 p) time and O(plogp)O(p\log p) space construction of Tr\"aff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.

View on arXiv
Comments on this paper