16
0

How many moments does MMD compare?

Abstract

We present a new way of study of Mercer kernels, by corresponding to a special kernel KK a pseudo-differential operator p(x,D)p({\mathbf x}, D) such that Fp(x,D)p(x,D)F1\mathcal{F} p({\mathbf x}, D)^\dag p({\mathbf x}, D) \mathcal{F}^{-1} acts on smooth functions in the same way as an integral operator associated with KK (where F\mathcal{F} is the Fourier transform). We show that kernels defined by pseudo-differential operators are able to approximate uniformly any continuous Mercer kernel on a compact set. The symbol p(x,y)p({\mathbf x}, {\mathbf y}) encapsulates a lot of useful information about the structure of the Maximum Mean Discrepancy distance defined by the kernel KK. We approximate p(x,y)p({\mathbf x}, {\mathbf y}) with the sum of the first rr terms of the Singular Value Decomposition of pp, denoted by pr(x,y)p_r({\mathbf x}, {\mathbf y}). If ordered singular values of the integral operator associated with p(x,y)p({\mathbf x}, {\mathbf y}) die down rapidly, the MMD distance defined by the new symbol prp_r differs from the initial one only slightly. Moreover, the new MMD distance can be interpreted as an aggregated result of comparing rr local moments of two probability distributions. The latter results holds under the condition that right singular vectors of the integral operator associated with pp are uniformly bounded. But even if this is not satisfied we can still hold that the Hilbert-Schmidt distance between pp and prp_r vanishes. Thus, we report an interesting phenomenon: the MMD distance measures the difference of two probability distributions with respect to a certain number of local moments, rr^\ast, and this number rr^\ast depends on the speed with which singular values of pp die down.

View on arXiv
Comments on this paper