265

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

IEEE Annual Symposium on Foundations of Computer Science (FOCS), 2024
Abstract

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit dd-dimensional ball and contains a ball of known radius ϵ>0\epsilon>0. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy ϵedo(1)\epsilon \geq e^{-d^{o(1)}}, any deterministic algorithm either uses d1+δd^{1+\delta} bits of memory or must make at least 1/(d0.01δϵ21δ1+1.01δo(1))1/(d^{0.01\delta }\epsilon^{2\frac{1-\delta}{1+1.01 \delta}-o(1)}) oracle queries, for any δ[0,1]\delta\in[0,1]. Additionally, we show that randomized algorithms either use d1+δd^{1+\delta} memory or make at least 1/(d2δϵ2(14δ)o(1))1/(d^{2\delta} \epsilon^{2(1-4\delta)-o(1)}) queries for any δ[0,14]\delta\in[0,\frac{1}{4}]. Because gradient descent only uses linear memory O(dln1/ϵ)\mathcal O(d\ln 1/\epsilon) but makes Ω(1/ϵ2)\Omega(1/\epsilon^2) queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in 1/ϵ1/\epsilon if the algorithm has less than quadratic memory in dd. This reveals a sharp phase transition since with quadratic O(d2ln1/ϵ)\mathcal O(d^2 \ln1/\epsilon) memory, cutting plane methods only require O(dln1/ϵ)\mathcal O(d\ln 1/\epsilon) queries.

View on arXiv
Comments on this paper