The Online Saddle Point Problem: Applications to Online Convex
Optimization with Knapsacks
We study the online saddle point problem, an online learning problem where at each iteration a pair of actions need to be chosen without knowledge of the current and future (convex-concave) payoff functions. The objective is to minimize the gap between the cumulative payoffs and the saddle point value of the aggregate payoff function, which we measure using a metric called "SP-regret". The problem generalizes the online convex optimization framework and can be interpreted as finding the Nash equilibrium for the aggregate of a sequence of two-player zero-sum games. We propose an algorithm that achieves SP-regret in the general case, and SP-regret for the strongly convex-concave case. We then consider an online convex optimization with knapsacks problem motivated by a wide variety of applications such as: dynamic pricing, auctions, and crowdsourcing. We relate this problem to the online saddle point problem and establish regret using a primal-dual algorithm.
View on arXiv