73
12
v1v2v3v4v5v6v7 (latest)

On the Randomized Complexity of Minimizing a Convex Quadratic Function

Max Simchowitz
Abstract

Minimizing a convex, quadratic objective of the form fA,b(x):=12xAxb,xf_{\mathbf{A},\mathbf{b}}(x) := \frac{1}{2}x^\top \mathbf{A} x - \langle \mathbf{b}, x \rangle for A0\mathbf{A} \succ 0 is a fundamental problem in machine learning and optimization. In this work, we prove gradient-query complexity lower bounds for minimizing convex quadratic functions which apply to both deterministic and \emph{randomized} algorithms. Specifically, for κ>1\kappa > 1, we exhibit a distribution over (A,b)(\mathbf{A},\mathbf{b}) with condition number cond(A)κ\mathrm{cond}(\mathbf{A}) \le \kappa, such that any \emph{randomized} algorithm requires Ω(κ)\Omega(\sqrt{\kappa}) gradient queries to find a solution x^\hat x for which x^xϵ0x\|\hat x - \mathbf x_\star\| \le \epsilon_0\|\mathbf{x}_{\star}\|, where x=A1b\mathbf x_{\star} = \mathbf{A}^{-1}\mathbf{b} is the optimal solution, and ϵ0\epsilon_0 a small constant. Setting κ=1/ϵ\kappa =1/\epsilon, this lower bound implies the minimax rate of T=Ω(λ1(A)x2/ϵ)T = \Omega(\lambda_1(\mathbf{A})\|\mathbf x_\star\|^2/\sqrt{\epsilon}) queries required to minimize an arbitrary convex quadratic function up to error f(x^)f(x)ϵf(\hat{x}) - f(\mathbf x_\star) \le \epsilon. Our lower bound holds for a distribution derived from classical ensembles in random matrix theory, and relies on a careful reduction from adaptively estimating a planted vector u\mathbf u in a deformed Wigner model. A key step in deriving sharp lower bounds is demonstrating that the optimization error xx^\mathbf x_\star - \hat x cannot align too closely with u\mathbf{u}. To this end, we prove an upper bound on the cosine between xx^\mathbf x_\star - \hat x and u\mathbf u in terms of the MMSE of estimating the plant u\mathbf u in a deformed Wigner model. We then bound the MMSE by carefully modifying a result due to Lelarge and Miolane 2016, which rigorously establishes a general replica-symmetric formula for planted matrix models.

View on arXiv
Comments on this paper