122
39

Discovering Relationships and their Structures Across Disparate Data Modalities

Abstract

Determining how certain properties are related to other properties is fundamental to scientific discovery. As data collection rates accelerate, it is becoming increasingly difficult, yet ever more important, to determine whether one property of data (e.g., cloud density) is related to another (e.g., grass wetness). Only if two properties are related are further investigations into the geometry of the relationship warranted. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes in real data scenarios, and do not address how they are related. Our key insight is that one can adaptively restrict the analysis to the "jointly local" observations, that is, one can estimate the scales with the most informative neighbors for determining the existence and geometry of a relationship. "Multiscale Graph Correlation" (MGC) is a framework that extends global procedures to be multiscale; consequently, MGC tests typically require far fewer samples than existing methods for a wide variety of dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC provides a simple and elegant multiscale characterization of the potentially complex latent geometry underlying the relationship. In several real data applications, MGC uniquely detects the presence and reveals the geometry of the relationships.

View on arXiv
Comments on this paper