31
58

A high-dimensional two-sample test for the mean using random subspaces

Abstract

A common problem in genetics is that of testing whether a set of highly dependent gene expressions differ between two populations, typically in a high-dimensional setting where the data dimension is larger than the sample size. Most high-dimensional tests for the equality of two mean vectors rely on naive diagonal or trace estimators of the covariance matrix, ignoring dependencies between variables. A test recently proposed by Lopes et al. (2012) implicitly incorporates dependencies by using random pseudo-projections to a lower-dimensional space. Their test offers higher power when the variables are dependent, but lacks desirable invariance properties and relies on asymptotic p-values that are too conservative. We illustrate how a permutation approach can be used to obtain p-values for the Lopes et al. test and how modifying the test using random subspaces leads to a test statistic that is invariant under linear transformations of the marginal distributions. The resulting test does not rely on assumptions about normality or the structure of the covariance matrix. We show by simulation that the new test has higher power than competing tests in realistic settings motivated by microarray gene expression data. We also discuss the computational aspects of high-dimensional permutation tests and provide an efficient R implementation of the proposed test.

View on arXiv
Comments on this paper