111
v1v2v3v4 (latest)

Efficient Parallelization of a Ubiquitous Sequential Computation

Abstract

We find a succinct expression for computing the sequence xt=atxt1+btx_t = a_t x_{t-1} + b_t in parallel with two prefix sums, given t=(1,2,,n)t = (1, 2, \dots, n), atRna_t \in \mathbb{R}^n, btRnb_t \in \mathbb{R}^n, and initial value x0Rx_0 \in \mathbb{R}. On nn parallel processors, the computation of nn elements incurs O(logn)\mathcal{O}(\log n) time and O(n)\mathcal{O}(n) space. Sequences of this form are ubiquitous in science and engineering, making efficient parallelization useful for a vast number of applications. We implement our expression in software, test it on parallel hardware, and verify that it executes faster than sequential computation by a factor of nlogn\frac{n}{\log n}.

View on arXiv
Comments on this paper