85
870

High-dimensional covariance estimation by minimizing 1\ell_1-penalized log-determinant divergence

Abstract

Given i.i.d. observations of a random vector XRpX \in \mathbb{R}^p, we study the problem of estimating both its covariance matrix Σ\Sigma^*, and its inverse covariance or concentration matrix {Θ=(Σ)1\Theta^* = (\Sigma^*)^{-1}.} We estimate Θ\Theta^* by minimizing an 1\ell_1-penalized log-determinant Bregman divergence; in the multivariate Gaussian case, this approach corresponds to 1\ell_1-penalized maximum likelihood, and the structure of Θ\Theta^* is specified by the graph of an associated Gaussian Markov random field. We analyze the performance of this estimator under high-dimensional scaling, in which the number of nodes in the graph pp, the number of edges ss and the maximum node degree dd, are allowed to grow as a function of the sample size nn. In addition to the parameters (p,s,d)(p,s,d), our analysis identifies other key quantities covariance matrix Σ\Sigma^*; and (b) the \ell_\infty operator norm of the sub-matrix ΓSS\Gamma^*_{S S}, where SS indexes the graph edges, and Γ=(Θ)1(Θ)1\Gamma^* = (\Theta^*)^{-1} \otimes (\Theta^*)^{-1}; and (c) a mutual incoherence or irrepresentability measure on the matrix Γ\Gamma^* and (d) the rate of decay 1/f(n,δ)1/f(n,\delta) on the probabilities {Σ^ijnΣij>δ} \{|\hat{\Sigma}^n_{ij}- \Sigma^*_{ij}| > \delta \}, where Σ^n\hat{\Sigma}^n is the sample covariance based on nn samples. Our first result establishes consistency of our estimate Θ^\hat{\Theta} in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees d=o(s)d = o(\sqrt{s}). In our second result, we show that with probability converging to one, the estimate Θ^\hat{\Theta} correctly specifies the zero pattern of the concentration matrix Θ\Theta^*.

View on arXiv
Comments on this paper