Holistic Cube Analysis: A Query Framework for Data Insights
We present Holistic Cube Analysis (HoCA), a framework that augments the query capabilities of SQL for analyzing a space of non-uniform tables for data insights. In HoCA, we first define abstract cube, a data type which models ``a space of non-uniform tables''. Then we describe two operators over abstract cubes: Cube crawling and cube join. Cube crawling gives an operator for exploring a subspace of tables and extracting signals from each table. It implements a visitor pattern and allows one to program the ``region analysis'' on individual tables. Cube join, in turn, allows one to join two cubes for deeper analysis. The power of the current HoCA framework comes from multi-model crawling, programmable models, and composition of operators in conjunction with SQL. We describe a variety of data-insights applications of HoCA in system monitoring, experimentation analysis, and business intelligence. Finally, we discuss avenues in extending the framework, such as devising more useful HoCA operators.
View on arXiv