122
39

Discovering Relationships Across Disparate Data Modalities

Abstract

Determining whether certain properties are related to other properties is fundamental to scientific discovery. As data collection rates accelerate, it is becoming increasingly difficult and important to determine whether one property of data (e.g., cloud density) is related to another (e.g., grass wetness). Only if two properties are related does it make sense to further investigate the nature of the relationship. Existing approaches excel in different settings, with no one approach dominating for all relationships and sample sizes, including structured high-dimensional data and nonlinear relationships. We juxtapose hypothesis testing, manifold learning, and harmonic analysis, to obtain Multiscale Generalized Correlation (MGC). Our key insight is that we can adaptively restrict the analysis to the most informative "jointly local" observations---that is, observations that are nearest neighbors for both the properties being compared. We prove that MGC statistically dominates previous approaches, even for finite samples, while maintaining computational efficiency. We used MGC to detect the presence and reveal the nature of the relationships between brain properties (including activity, shape, and connectivity) and mental properties (including personality, health, and creativity), while avoiding the false positive inflation problem that has plagued conventional parametric approaches.

View on arXiv
Comments on this paper