50
v1v2v3 (latest)

A simple extension of Azadkia & Chatterjee's rank correlation to multi-response vectors

Abstract

Recently, Chatterjee (2023) recognized the lack of a direct generalization of his rank correlation ξ\xi in Azadkia and Chatterjee (2021) to a multi-dimensional response vector. As a natural solution to this problem, we here propose an extension of ξ\xi that is applicable to a set of q1q \geq 1 response variables, where our approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation ξ\xi to it. Our novel measure TT quantifies the scale-invariant extent of functional dependence of a response vector Y=(Y1,,Yq)\mathbf{Y} = (Y_1,\dots,Y_q) on predictor variables X=(X1,,Xp)\mathbf{X} = (X_1, \dots,X_p), characterizes independence of X\mathbf{X} and Y\mathbf{Y} as well as perfect dependence of Y\mathbf{Y} on X\mathbf{X} and hence fulfills all the characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various invariance results for TT as well as a closed-form expression in multivariate normal models. Building upon the graph-based estimator for ξ\xi in Azadkia and Chatterjee (2021), we obtain a non-parametric, strongly consistent estimator for TT and show its asymptotic normality. Based on this estimator, we develop a model-free and dependence-based feature ranking and forward feature selection for multiple-outcome data. Simulation results and real case studies illustrate TT's broad applicability.

View on arXiv
Comments on this paper