149
v1v2 (latest)

An Analysis of the Johnson-Lindenstrauss Lemma with the Bivariate Gamma Distribution

Abstract

Probabilistic proofs of the Johnson-Lindenstrauss lemma imply that random projection can reduce the dimension of a data set and approximately preserve pairwise distances. If a distance being approximately preserved is called a success, and the complement of this event is called a failure, then such a random projection likely results in no failures. Assuming a Gaussian random projection, the lemma is proved by showing that the no-failure probability is positive using a combination of Bonferroni's inequality and Markov's inequality. This paper modifies this proof in two ways to obtain a greater lower bound on the no-failure probability. First, Bonferroni's inequality is applied to pairs of failures instead of individual failures. Second, since a pair of projection errors has a bivariate gamma distribution, the probability of a pair of successes is bounded using an inequality from Jensen (1969). If nn is the number of points to be embedded and μ\mu is the probability of a success, then this leads to an increase in the lower bound on the no-failure probability of 12(n2)(1μ)2\frac{1}{2}\binom{n}{2}(1-\mu)^2 if (n2)\binom{n}{2} is even and 12((n2)1)(1μ)2\frac{1}{2}\left(\binom{n}{2}-1\right)(1-\mu)^2 if (n2)\binom{n}{2} is odd. For example, if n=105n=10^5 points are to be embedded in k=104k=10^4 dimensions with a tolerance of ϵ=0.1\epsilon=0.1, then the improvement in the lower bound is on the order of 101410^{-14}. We also show that further improvement is possible if the inequality in Jensen (1969) extends to three successes, though we do not have a proof of this result.

View on arXiv
Comments on this paper