266

Knockoffs for exchangeable categorical covariates

Main:22 Pages
6 Figures
Bibliography:2 Pages
Abstract

Let X=(X1,,Xp)X=(X_1,\ldots,X_p) be a pp-variate random vector and FF a fixed finite set. In a number of applications, mainly in genetics, it turns out that XiFX_i\in F for each i=1,,pi=1,\ldots,p. Despite the latter fact, to obtain a knockoff X~\widetilde{X} (in the sense of \cite{CFJL18}), XX is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since XX is supported by the finite set FpF^p. In this paper, explicit formulae for the joint distribution of (X,X~)(X,\widetilde{X}) are provided when P(XFp)=1P(X\in F^p)=1 and XX is exchangeable or partially exchangeable. In fact, when XiFX_i\in F for all ii, there seem to be various reasons for assuming XX exchangeable or partially exchangeable. The robustness of X~\widetilde{X}, with respect to the de Finetti's measure π\pi of XX, is investigated as well. Let Lπ(X~X=x)\mathcal{L}_\pi(\widetilde{X}\mid X=x) denote the conditional distribution of X~\widetilde{X}, given X=xX=x, when the de Finetti's measure is π\pi. It is shown that \normLπ1(X~X=x)Lπ2(X~X=x)c(x)\normπ1π2\norm{\mathcal{L}_{\pi_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{\pi_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{\pi_1-\pi_2} where \norm\norm{\cdot} is total variation distance and c(x)c(x) a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving XX an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.

View on arXiv
Comments on this paper