ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.12406
18
1

Fast Rates for Bandit PAC Multiclass Classification

18 June 2024
Liad Erez
Alon Cohen
Tomer Koren
Yishay Mansour
Shay Moran
ArXivPDFHTML
Abstract

We study multiclass PAC learning with bandit feedback, where inputs are classified into one of KKK possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic (ε,δ)(\varepsilon,\delta)(ε,δ)-PAC version of the problem, with sample complexity of O((poly⁡(K)+1/ε2)log⁡(∣H∣/δ))O\big( (\operatorname{poly}(K) + 1 / \varepsilon^2) \log (|H| / \delta) \big)O((poly(K)+1/ε2)log(∣H∣/δ)) for any finite hypothesis class HHH. In terms of the leading dependence on ε\varepsilonε, this improves upon existing bounds for the problem, that are of the form O(K/ε2)O(K/\varepsilon^2)O(K/ε2). We also provide an extension of this result to general classes and establish similar sample complexity bounds in which log⁡∣H∣\log |H|log∣H∣ is replaced by the Natarajan dimension. This matches the optimal rate in the full-information version of the problem and resolves an open question studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011) who demonstrated that the multiplicative price of bandit feedback in realizable PAC learning is Θ(K)\Theta(K)Θ(K). We complement this by revealing a stark contrast with the agnostic case, where the price of bandit feedback is only O(1)O(1)O(1) as ε→0\varepsilon \to 0ε→0. Our algorithm utilizes a stochastic optimization technique to minimize a log-barrier potential based on Frank-Wolfe updates for computing a low-variance exploration distribution over the hypotheses, and is made computationally efficient provided access to an ERM oracle over HHH.

View on arXiv
Comments on this paper