52

A simple extension of Azadkia &\& Chatterjee's rank correlation to a vector of endogenous variables

Abstract

We propose a direct and natural extension of Azadkia & Chatterjee's rank correlation TT introduced in [4] to a set of q1q \geq 1 endogenous variables. The approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation TT to it. The novel measure TqT^q then quantifies the scale-invariant extent of functional dependence of an endogenous vector Y=(Y1,,Yq){\bf Y} = (Y_1,\dots,Y_q) on a number of exogenous variables X=(X1,,Xp){\bf X} = (X_1,\dots,X_p), p1p\geq1, characterizes independence of X{\bf X} and Y{\bf Y} as well as perfect dependence of Y{\bf Y} on X{\bf X} and hence fulfills all the desired characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various general invariance and continuity conditions for TqT^q as well as novel ordering results for conditional distributions, revealing new insights into the nature of TT. Building upon the graph-based estimator for TT in [4], we present a non-parametric estimator for TqT^q that is strongly consistent in full generality, i.e., without any distributional assumptions. Based on this estimator we develop a model-free and dependence-based feature ranking and forward feature selection of multiple-outcome data, and establish tools for identifying networks between random variables. Real case studies illustrate the main aspects of the developed methodology.

View on arXiv
Comments on this paper