93
10

Some Analysis of the Knockoff Filter and its Variants

Abstract

In many applications, we need to study a linear regression model that consists of a response variable and a large number of potential explanatory variables and determine which variables are truly associated with the response. In 2015, Barber and Candes introduced a new variable selection procedure called the knockoff filter to control the false discovery rate (FDR) and proved that this method achieves exact FDR control. In this paper, we provide some analysis of the knockoff filter and its variants. Based on our analysis, we propose a PCA prototype group selection filter that has exact group FDR control and several advantages over existing group selection methods for strongly correlated features. Another contribution is that we propose a new noise estimator that can be incorporated into the knockoff statistic from a penalized method without violating the exchangeability property. Our analysis also reveals that some knockoff statistics, including the Lasso path and the marginal correlation statistics, suffer from the alternating sign effect. To overcome this deficiency, we introduce the notion of a good statistic and propose several alternative statistics that take advantage of the good statistic property. Finally, we present a number of numerical experiments to demonstrate the effectiveness of our methods and confirm our analysis.

View on arXiv
Comments on this paper