ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.06076
47
10

Smooth Bandit Optimization: Generalization to Hölder Space

11 December 2020
Yusha Liu
Yining Wang
Aarti Singh
ArXiv (abs)PDFHTML
Abstract

We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for α\alphaα-H\"older continuous (including Lipschitz) functions with 0<α≤10<\alpha\leq 10<α≤1. Our main result is in generalization of the reward function to H\"older space with exponent α>1\alpha>1α>1 to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For H\"older continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of O~(Td+αd+2α)\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})O~(Td+2αd+α​) for when α>1\alpha>1α>1, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of H\"older spaces indexed by α\alphaα, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the α≤1\alpha\leq 1α≤1 subset.

View on arXiv
Comments on this paper