Parallel inference for massive distributed spatial data using low-rank models

6 February 2014

Abstract

Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference based on all data without having to move the datasets to a central computing node. We show that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly, meaning that the results are the same as for a traditional, non-distributed analysis. The computational cost of our distributed algorithms is linear in the number of data points, while the communication cost does not depend on the data sizes at all. After extending our results to the spatio-temporal case, we illustrate our methodology by carrying out distributed spatio-temporal particle filtering inference on total precipitable water measured by three different satellite sensor systems.

View on arXiv

Comments on this paper