Sparse Linear Regression is Easy on Random Supports

9 November 2025

Gautam Chandrasekaran

Raghu Meka

Konstantinos Stavropoulos

ArXiv (abs)PDF HTML Github

Main:24 Pages

Bibliography:3 Pages

Abstract

Sparse linear regression is one of the most basic questions in machine learning and statistics. Here, we are given as input a design matrix $X \in \mathbb{R}^{N \times d}$ and measurements or labels ${y} \in \mathbb{R}^N$ where ${y} = {X} {w}^* + {\xi}$ , and ${\xi}$ is the noise in the measurements. Importantly, we have the additional constraint that the unknown signal vector ${w}^*$ is sparse: it has $k$ non-zero entries where $k$ is much smaller than the ambient dimension. Our goal is to output a prediction vector $\widehat{w}$ that has small prediction error: $\frac{1}{N}\cdot \|{X} {w}^* - {X} \widehat{w}\|^2_2$ .Information-theoretically, we know what is best possible in terms of measurements: under most natural noise distributions, we can get prediction error at most $\epsilon$ with roughly $N = O(k \log d/\epsilon)$ samples. Computationally, this currently needs $d^{\Omega(k)}$ run-time. Alternately, with $N = O(d)$ , we can get polynomial-time. Thus, there is an exponential gap (in the dependence on $d$ ) between the two and we do not know if it is possible to get $d^{o(k)}$ run-time and $o(d)$ samples.We give the first generic positive result for worst-case design matrices ${X}$ : For any ${X}$ , we show that if the support of ${w}^*$ is chosen at random, we can get prediction error $\epsilon$ with $N = \text{poly}(k, \log d, 1/\epsilon)$ samples and run-time $\text{poly}(d,N)$ . This run-time holds for any design matrix ${X}$ with condition number up to $2^{\text{poly}(d)}$ .Previously, such results were known for worst-case ${w}^*$ , but only for random design matrices from well-behaved families, matrices that have a very low condition number ( $\text{poly}(\log d)$ ; e.g., as studied in compressed sensing), or those with special structural properties.

View on arXiv

Comments on this paper