18
4

High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees

Abstract

We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the ss-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the NN total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimensional scaling, allowing the ambient dimension dd to grow with (and possibly exceed) the sample size NN. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, O(slogd/N)O(s\log d/N). When slogd/N=o(1)s\log d/N=o(1), a condition necessary for statistical consistency, an ε\varepsilon-optimal solution is attained after O(κlog(1/ε))\mathcal{O}(\kappa \log (1/\varepsilon)) gradient computations and O(κ/(1ρ)log(1/ε))O (\kappa/(1-\rho) \log (1/\varepsilon)) communication rounds, where κ\kappa is the restricted condition number of the loss function and ρ\rho measures the network connectivity. The computation cost matches that of the centralized projected gradient algorithm despite having data distributed; whereas the communication rounds reduce as the network connectivity improves. Overall, our study reveals interesting connections between statistical efficiency, network connectivity \& topology, and convergence rate in high dimensions.

View on arXiv
Comments on this paper