51
0

Tree-Projected Gradient Descent for Estimating Gradient-Sparse Parameters on Graphs

Abstract

We study estimation of a gradient-sparse parameter vector θRp\boldsymbol{\theta}^* \in \mathbb{R}^p, having strong gradient-sparsity s:=Gθ0s^*:=\|\nabla_G \boldsymbol{\theta}^*\|_0 on an underlying graph GG. Given observations Z1,,ZnZ_1,\ldots,Z_n and a smooth, convex loss function L\mathcal{L} for which θ\boldsymbol{\theta}^* minimizes the population risk E[L(θ;Z1,,Zn)]\mathbb{E}[\mathcal{L}(\boldsymbol{\theta};Z_1,\ldots,Z_n)], we propose to estimate θ\boldsymbol{\theta}^* by a projected gradient descent algorithm that iteratively and approximately projects gradient steps onto spaces of vectors having small gradient-sparsity over low-degree spanning trees of GG. We show that, under suitable restricted strong convexity and smoothness assumptions for the loss, the resulting estimator achieves the squared-error risk snlog(1+ps)\frac{s^*}{n} \log (1+\frac{p}{s^*}) up to a multiplicative constant that is independent of GG. In contrast, previous polynomial-time algorithms have only been shown to achieve this guarantee in more specialized settings, or under additional assumptions for GG and/or the sparsity pattern of Gθ\nabla_G \boldsymbol{\theta}^*. As applications of our general framework, we apply our results to the examples of linear models and generalized linear models with random design.

View on arXiv
Comments on this paper