Minimax Subsampling for Estimation and Prediction in Low-Dimensional Linear Regression

Abstract
Subsampling strategies are derived to sample a small portion of design (data) points in a low-dimensional linear regression model with near-optimal statistical rates. Our results apply to both problems of estimation of the underlying linear model and predicting the real-valued response of a new data point . The derived subsampling strategies are minimax optimal under the fixed design setting, up to a small relative factor. We also give interpretable subsampling probabilities for the random design setting and demonstrate explicit gaps in statistial rates between optimal and baseline (e.g., uniform) subsampling methods.
View on arXivComments on this paper