83
v1v2 (latest)

Permutation p-value approximation via generalized Stolarsky invariance

Abstract

It is common for genomic data analysis to use pp-values from a large number of permutation tests. The multiplicity of tests may require very tiny pp-values in order to reject any null hypotheses and the common practice of using randomly sampled permutations then becomes very expensive. We propose an inexpensive approximation to pp-values for two sample linear test statistics, derived from Stolarsky's invariance principle. The method creates a geometrically derived set of approximate pp-values for each hypothesis. The average of that set is used as a point estimate p^\hat p and our generalization of the invariance principle allows us to compute the variance of the pp-values in that set. We find that in cases where the point estimate is small the variance is a modest multiple of the square of the point estimate, yielding a relative error property similar to that of saddlepoint approximations. On a Parkinson's disease data set, the new approximation is faster and more accurate than the saddlepoint approximation. We also obtain a simple probabilistic explanation of Stolarsky's invariance principle.

View on arXiv
Comments on this paper