Online Active Linear Regression via Thresholding

9 February 2016

Abstract

We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker that faces a limited experimentation budget but must efficiently learn an underlying linear population model. Our goal is to develop algorithms that provide substantial gains over passive random sampling of observations. To that end, our main contribution is a novel threshold-based algorithm for selection of observations; we characterize its performance and related lower bounds. We also apply our approach successfully to regularized regression. Simulations suggest the algorithm is remarkably robust: it provides significant benefits over passive random sampling even in several real-world datasets that exhibit high nonlinearity and high dimensionality --- significantly reducing the mean and variance of the squared error.

View on arXiv

Comments on this paper