Nimble Algorithms for Cloud Computing

Cloud computing is a new paradigm where data is stored across multiple servers and the goal is to compute a function of all the data. We consider a simple model where each server uses polynomial time and space, but communication among servers being more expensive is ideally bounded by a polylogarithmic function of the input size. We will dub algorithms that satisfy these types of resource bounds as "nimble". The main contribution of the paper is to develop nimble algorithms for several areas which involve massive data and for that reason have been extensively studied in the context of Streaming Algorithms. The areas are approximation of Frequency Moments, Counting bipartite homomorphisms (number of copies of a fixed bipartite graph H in a graph G), Rank-k approximation to a matrix, and Clustering. For frequency moments, we will use a new importance sampling technique based on high powers of the frequencies. We reduce the problem of counting homomorphisms to estimating implicitly defined frequency moments. For rank-k approximations, besides recent results of several authors developed in the Streaming context, we use a variant of the random projection method. For clustering, we use our rank-k approximation and the small "coreset" of Chen, of size at most polynomial in the dimension. In contrast to our algorithms in the cloud computing model, in the streaming model, known lower bound results for frequency moments and rank-k approximations rule out the existence of algorithms that use polylogarithmic space.
View on arXiv