A Note on High Dimensional Linear Regression with Interactions

The problem of interaction selection has recently caught much attention in high dimensional data analysis. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss issues such as a valid way of defining importance for the main effects and interaction effects, the invariance principle, and the strong heredity condition. Then we focus on two-stage methods, which are computationally attractive for large p problems but regarded heuristic in the literature. We will revisit the counterexample of Turlach (2004) and provide new insight to justify two-stage methods from a theoretical perspective. In the end, we suggest some new strategies for interaction selection under the marginality principle, which is followed by a numerical example.
View on arXiv