13
1

Generating knockoffs via conditional independence

Abstract

Let XX be a pp-variate random vector and X~\widetilde{X} a knockoff copy of XX (in the sense of \cite{CFJL18}). A new approach for constructing X~\widetilde{X} (henceforth, NA) has been introduced in \cite{JSPI}. NA has essentially three advantages: (i) To build X~\widetilde{X} is straightforward; (ii) The joint distribution of (X,X~)(X,\widetilde{X}) can be written in closed form; (iii) X~\widetilde{X} is often optimal under various criteria. However, for NA to apply, X1,,XpX_1,\ldots, X_p should be conditionally independent given some random element ZZ. Our first result is that any probability measure μ\mu on Rp\mathbb{R}^p can be approximated by a probability measure μ0\mu_0 of the form \mu_0\bigl(A_1\times\ldots\times A_p\bigr)=E\Bigl\{\prod_{i=1}^p P(X_i\in A_i\mid Z)\Bigr\}. The approximation is in total variation distance when μ\mu is absolutely continuous, and an explicit formula for μ0\mu_0 is provided. If Xμ0X\sim\mu_0, then X1,,XpX_1,\ldots,X_p are conditionally independent. Hence, with a negligible error, one can assume Xμ0X\sim\mu_0 and build X~\widetilde{X} through NA. Our second result is a characterization of the knockoffs X~\widetilde{X} obtained via NA. It is shown that X~\widetilde{X} is of this type if and only if the pair (X,X~)(X,\widetilde{X}) can be extended to an infinite sequence so as to satisfy certain invariance conditions. The basic tool for proving this fact is de Finetti's theorem for partially exchangeable sequences. In addition to the quoted results, an explicit formula for the conditional distribution of X~\widetilde{X} given XX is obtained in a few cases. In one of such cases, it is assumed Xi{0,1}X_i\in\{0,1\} for all ii.

View on arXiv
Comments on this paper