31
10

Random matrix-improved estimation of covariance matrix distances

Abstract

Given two sets x1(1),,xn1(1)x_1^{(1)},\ldots,x_{n_1}^{(1)} and x1(2),,xn2(2)Rpx_1^{(2)},\ldots,x_{n_2}^{(2)}\in\mathbb{R}^p (or Cp\mathbb{C}^p) of random vectors with zero mean and positive definite covariance matrices C1C_1 and C2Rp×pC_2\in\mathbb{R}^{p\times p} (or Cp×p\mathbb{C}^{p\times p}), respectively, this article provides novel estimators for a wide range of distances between C1C_1 and C2C_2 (along with divergences between some zero mean and covariance C1C_1 or C2C_2 probability measures) of the form 1pi=1nf(λi(C11C2))\frac1p\sum_{i=1}^n f(\lambda_i(C_1^{-1}C_2)) (with λi(X)\lambda_i(X) the eigenvalues of matrix XX). These estimators are derived using recent advances in the field of random matrix theory and are asymptotically consistent as n1,n2,pn_1,n_2,p\to\infty with non trivial ratios p/n1<1p/n_1<1 and p/n2<1p/n_2<1 (the case p/n2>1p/n_2>1 is also discussed). A first "generic" estimator, valid for a large set of ff functions, is provided under the form of a complex integral. Then, for a selected set of ff's of practical interest (namely, f(t)=tf(t)=t, f(t)=log(t)f(t)=\log(t), f(t)=log(1+st)f(t)=\log(1+st) and f(t)=log2(t)f(t)=\log^2(t)), a closed-form expression is provided. Beside theoretical findings, simulation results suggest an outstanding performance advantage for the proposed estimators when compared to the classical "plug-in" estimator 1pi=1nf(λi(C^11C^2))\frac1p\sum_{i=1}^n f(\lambda_i(\hat C_1^{-1}\hat C_2)) (with C^a=1nai=1naxi(a)xi(a)T\hat C_a=\frac1{n_a}\sum_{i=1}^{n_a}x_i^{(a)}x_i^{(a){\sf T}}), and this even for very small values of n1,n2,pn_1,n_2,p.

View on arXiv
Comments on this paper