ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.09795
106
41
v1v2v3v4v5v6v7 (latest)

Optimal No-regret Learning in Repeated First-price Auctions

22 March 2020
Yanjun Han
Zhengyuan Zhou
Tsachy Weissman
ArXiv (abs)PDFHTML
Abstract

We study online learning in repeated first-price auctions where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a censored feedback: if she wins the bid, then she is not able to observe the highest bid of the other bidders, which we assume is \textit{iid} drawn from an unknown distribution. In this paper, we develop the first learning algorithm that achieves a near-optimal O~(T)\widetilde{O}(\sqrt{T})O(T​) regret bound, by exploiting two structural properties of first-price auctions, i.e. the specific feedback structure and payoff function. The feedback in first-price auctions combines the graph feedback across actions (bids), the cross learning across contexts (private values), and a partial order over the contexts; we generalize it as the partially ordered contextual bandits. We establish both strengths and weaknesses of this framework, by showing a curious separation that a regret nearly independent of the action/context sizes is possible under stochastic contexts, but is impossible under adversarial contexts. In particular, this framework leads to an O(Tlog⁡2.5T)O(\sqrt{T}\log^{2.5}T)O(T​log2.5T) regret for first-price auctions when the bidder's private values are \emph{iid}. Despite the limitation of the above framework, we further exploit the special payoff function of first-price auctions to develop a sample-efficient algorithm even in the presence of adversarially generated private values. We establish an O(Tlog⁡3T)O(\sqrt{T}\log^3 T)O(T​log3T) regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for first-price auctions.

View on arXiv
Comments on this paper