A theoretical treatment of conditional independence testing under Model-X

12 May 2020

Abstract

For testing conditional independence (CI) of a response $Y$ and a predictor $X$ given covariates $Z$ , the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we build a theoretical foundation for the MX CI problem, yielding quantitative explanations for empirically observed phenomena and novel insights to guide the design of MX methodology. We focus our analysis on the conditional randomization test (CRT), whose validity conditional on $Y,Z$ allows us to view it as a test of a point null hypothesis involving the conditional distribution of $X$ . We use the Neyman-Pearson lemma to derive the most powerful CRT statistic against a point alternative as well as an analogous result for MX knockoffs. We define CRT-style analogs of $t$ - and $F$ -tests with explicit critical values, and show that they have uniform asymptotic Type-I error control under the assumption that only the first two moments of $X$ given $Z$ are known, a significant relaxation of MX. We derive expressions for the power of these tests against local semiparametric alternatives using Le Cam's local asymptotic normality theory, explicitly capturing the prediction error of the underlying learning algorithm. Finally, we pave the way for estimation in the MX setting by drawing connections to semiparametric statistics and causal inference. Thus, this work forms explicit bridges from MX to both classical statistics (testing) and modern causal inference (estimation).

View on arXiv

Comments on this paper