A simple extension of Azadkia Chatterjee's rank correlation to a
vector of endogenous variables
We propose a direct and natural extension of Azadkia & Chatterjee's rank correlation introduced in [4] to a set of endogenous variables. The approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation to it. The novel measure then quantifies the scale-invariant extent of functional dependence of an endogenous vector on a number of exogenous variables , , characterizes independence of and as well as perfect dependence of on and hence fulfills all the desired characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various general invariance and continuity conditions for as well as novel ordering results for conditional distributions, revealing new insights into the nature of . Building upon the graph-based estimator for in [4], we present a non-parametric estimator for that is strongly consistent in full generality, i.e., without any distributional assumptions. Based on this estimator we develop a model-free and dependence-based feature ranking and forward feature selection of multiple-outcome data, and establish tools for identifying networks between random variables. Real case studies illustrate the main aspects of the developed methodology.
View on arXiv