26
1

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

Abstract

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit dd-dimensional ball and contains a ball of known radius ϵ>0\epsilon>0. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy ϵedo(1)\epsilon \geq e^{-d^{o(1)}}, any deterministic algorithm either uses d1+δd^{1+\delta} bits of memory or must make at least 1/(d0.01δϵ21δ1+1.01δo(1))1/(d^{0.01\delta }\epsilon^{2\frac{1-\delta}{1+1.01 \delta}-o(1)}) oracle queries, for any δ[0,1]\delta\in[0,1]. Additionally, we show that randomized algorithms either use d1+δd^{1+\delta} memory or make at least 1/(d2δϵ2(14δ)o(1))1/(d^{2\delta} \epsilon^{2(1-4\delta)-o(1)}) queries for any δ[0,14]\delta\in[0,\frac{1}{4}]. Because gradient descent only uses linear memory O(dln1/ϵ)\mathcal O(d\ln 1/\epsilon) but makes Ω(1/ϵ2)\Omega(1/\epsilon^2) queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in 1/ϵ1/\epsilon if the algorithm has less than quadratic memory in dd. This reveals a sharp phase transition since with quadratic O(d2ln1/ϵ)\mathcal O(d^2 \ln1/\epsilon) memory, cutting plane methods only require O(dln1/ϵ)\mathcal O(d\ln 1/\epsilon) queries.

View on arXiv
Comments on this paper