ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.05404
58
6
v1v2v3 (latest)

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

11 June 2022
Wonyoung Hedge Kim
M. Paik
Min-whan Oh
ArXiv (abs)PDFHTML
Abstract

We propose a novel algorithm for linear contextual bandits with O(dTlog⁡T)O(\sqrt{dT \log T})O(dTlogT​) regret bound, where ddd is the dimension of contexts and TTT is the time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contribution either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into additive dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of Ω(dT)\Omega(\sqrt{dT})Ω(dT​) under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.

View on arXiv
Comments on this paper