58

Connecting model-based and model-free approaches to linear least squares regression

Abstract

In a regression setting with response vector yRn\mathbf{y} \in \mathbb{R}^n and given regressor vectors x1,,xpRn\mathbf{x}_1,\ldots,\mathbf{x}_p \in \mathbb{R}^n, a typical question is to what extent y\mathbf{y} is related to these regressor vectors, specifically, how well can y\mathbf{y} be approximated by a linear combination of them. Classical methods for this question are based on statistical models for the conditional distribution of y\mathbf{y}, given the regressor vectors xj\mathbf{x}_j. Davies and Duembgen (2020) proposed a model-free approach in which all observation vectors y\mathbf{y} and xj\mathbf{x}_j are viewed as fixed, and the quality of the least squares fit of y\mathbf{y} is quantified by comparing it with the least squares fit resulting from pp independent white noise regressor vectors. The purpose of the present note is to explain in a general context why the model-based and model-free approach yield the same p-values, although the interpretation of the latter is different under the two paradigms.

View on arXiv
Comments on this paper