128

On computing and the complexity of computing higher-order UU-statistics, exactly

Main:29 Pages
16 Figures
Bibliography:1 Pages
9 Tables
Appendix:19 Pages
Abstract

Higher-order UU-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill that gap by presenting several results related to the computational aspect of UU-statistics. First, we derive a useful decomposition from an mm-th order UU-statistic to a linear combination of VV-statistics with orders not exceeding mm, which are generally more feasible to compute. Second, we explore the connection between exactly computing VV-statistics and Einstein summation, a tool often used in computational mathematics, quantum computing, and quantum information sciences for accelerating tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing UU-statistics, based on the treewidth of a particular graph associated with the UU-statistic kernel. The above ingredients lead to a new, much more runtime-efficient algorithm of exactly computing general higher-order UU-statistics. We also wrap our new algorithm into an open-source Python package called u-stats\texttt{u-stats}. We demonstrate via three statistical applications that u-stats\texttt{u-stats} achieves impressive runtime performance compared to existing benchmarks. This paper aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas further to advance the algorithmic development of UU-statistics, and (2) to offer the package u-stats\texttt{u-stats} as a valuable tool for practitioners, making the implementation of methods based on higher-order UU-statistics a more delightful experience.

View on arXiv
Comments on this paper