Finding meaningful ways to measure the statistical dependency between random variables and is a timeless statistical endeavor. In recent years, several novel concepts, like the distance covariance, have extended classical notions of dependency to more general settings. In this article, we propose and study an alternative framework that is based on optimal transport. The transport dependency applies to general Polish spaces and intrinsically respects metric properties. For suitable ground costs, independence is fully characterized by . Via proper normalization of , three transport correlations , , and with values in are defined. They attain the value if and only if , where is an -Lipschitz function for , a measurable function for , or a multiple of an isometry for . The transport dependency can be estimated consistently by an empirical plug-in approach, but alternative estimators with the same convergence rate but significantly reduced computational costs are also proposed. Numerical results suggest that robustly recovers dependency between data sets with different internal metric structures. The usage for inferential tasks, like transport dependency based independence testing, is illustrated on a data set from a cancer study.
View on arXiv