Connecting model-based and model-free approaches to linear least squares regression

25 July 2018

Abstract

In a regression setting with response vector $\mathbf{y} \in \mathbb{R}^n$ and given regressor vectors $\mathbf{x}_1,\ldots,\mathbf{x}_p \in \mathbb{R}^n$ , a typical question is to what extent $\mathbf{y}$ is related to these regressor vectors, specifically, how well can $\mathbf{y}$ be approximated by a linear combination of them. Classical methods for this question are based on statistical models for the conditional distribution of $\mathbf{y}$ , given the regressor vectors $\mathbf{x}_j$ . Davies and Duembgen (2020) proposed a model-free approach in which all observation vectors $\mathbf{y}$ and $\mathbf{x}_j$ are viewed as fixed, and the quality of the least squares fit of $\mathbf{y}$ is quantified by comparing it with the least squares fit resulting from $p$ independent white noise regressor vectors. The purpose of the present note is to explain in a general context why the model-based and model-free approach yield the same p-values, although the interpretation of the latter is different under the two paradigms.

View on arXiv

Comments on this paper