251

Generating knockoffs via conditional independence

Electronic Journal of Statistics (EJS), 2022
Abstract

Let XX be a pp-variate random vector and X~\widetilde{X} a knockoff copy of XX (in the sense of \cite{CFJL18}). A new approach for constructing X~\widetilde{X} (henceforth, NA) has been introduced in \cite{JSPI}. NA has essentially three advantages: (i) To build X~\widetilde{X} is straightforward; (ii) The joint distribution of (X,X~)(X,\widetilde{X}) can be written in closed form; (iii) X~\widetilde{X} is often optimal under various criteria, including mean absolute correlation and reconstructability. However, for NA to apply, the distribution of XX needs to be of the form ()P(X1A1,,XpAp)=E{i=1pP(XiAiZ)}(*)\quad\quad\quad\quad\quad\quad P(X_1\in A_1,\ldots,X_p\in A_p)=E\Bigl\{\prod_{i=1}^pP(X_i\in A_i\mid Z)\Bigr\} for some random element ZZ. Our first result is that any probability measure μ\mu on Rp\mathbb{R}^p can be approximated by a probability measure μ0\mu_0 which makes condition (*) true. If μ\mu is absolutely continuous, the approximation holds in total variation distance. In applications, regarding μ\mu as the distribution of XX, this result suggests using the knockoffs based on μ0\mu_0 instead of those based on μ\mu (which are generally unknown). Our second result is a characterization of the pairs (X,X~)(X,\widetilde{X}) where X~\widetilde{X} is obtained via NA. It turns out that (X,X~)(X,\widetilde{X}) is of this type if and only if it can be extended to an infinite sequence so as to satisfy certain invariance conditions. The basic tool for proving this fact is de Finetti's theorem for partially exchangeable sequences.

View on arXiv
Comments on this paper