Enabling Distributed-Memory Tensor Completion in Python using New Sparse Tensor Kernels

6 October 2019

Navjot Singh

Abstract

Tensor computations are increasingly prevalent numerical techniques in data science, but pose unique challenges for high-performance implementation. We provide novel algorithms and systems infrastructure, together enabling the first high-level parallel implementations of three algorithms for the tensor completion problem: alternating least squares (ALS), stochastic gradient descent (SGD), and coordinate descent (CCD++). We develop these methods using a new Python interface to the Cyclops tensor algebra library, which fully automates the management of distributed-memory parallelism and sparsity for NumPy-style operations on multidimensional arrays. To make possible tensor completion for very sparse tensors, we introduce a new multi-tensor routine, TTTP, that is asymptotically more efficient than pairwise tensor contraction for key components of the tensor completion methods. In particular, we show how TTTP can be used to perform an ALS via conjugate gradient with implicit matrix-vector products, a novel tensor completion algorithm. Further, we provide the first distributed tensor library with hypersparse matrix representations, via integration of new sequential and parallel routines into the Cyclops library. We provide microbenchmarking results on the Stampede2 supercomputer to demonstrate the efficiency of this functionality. Finally, we study the performance of the tensor completion methods for a synthetic tensor with 10 billion nonzeros and the Netflix dataset.

View on arXiv

Comments on this paper