227

The Online Saddle Point Problem: Applications to Online Convex Optimization with Knapsacks

Abstract

We study the online saddle point problem, an online learning problem where at each iteration a pair of actions need to be chosen without knowledge of the current and future (convex-concave) payoff functions. The objective is to minimize the gap between the cumulative payoffs and the saddle point value of the aggregate payoff function, which we measure using a metric called "SP-regret". The problem generalizes the online convex optimization framework and can be interpreted as finding the Nash equilibrium for the aggregate of a sequence of two-player zero-sum games. We propose an algorithm that achieves O~(T)\tilde{O}(\sqrt{T}) SP-regret in the general case, and O(logT)O(\log T) SP-regret for the strongly convex-concave case. We then consider an online convex optimization with knapsacks problem motivated by a wide variety of applications such as: dynamic pricing, auctions, and crowdsourcing. We relate this problem to the online saddle point problem and establish O(T)O(\sqrt{T}) regret using a primal-dual algorithm.

View on arXiv
Comments on this paper