ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.10131
18
0

Linear Contextual Bandits with Hybrid Payoff: Revisited

14 June 2024
Nirjhar Das
Gaurav Sinha
ArXivPDFHTML
Abstract

We study the Linear Contextual Bandit problem in the hybrid reward setting. In this setting every arm's reward model contains arm specific parameters in addition to parameters shared across the reward models of all the arms. We can reduce this setting to two closely related settings (a) Shared - no arm specific parameters, and (b) Disjoint - only arm specific parameters, enabling the application of two popular state of the art algorithms - LinUCB\texttt{LinUCB}LinUCB and DisLinUCB\texttt{DisLinUCB}DisLinUCB (Algorithm 1 in (Li et al. 2010)). When the arm features are stochastic and satisfy a popular diversity condition, we provide new regret analyses for both algorithms, significantly improving on the known regret guarantees of these algorithms. Our novel analysis critically exploits the hybrid reward structure and the diversity condition. Moreover, we introduce a new algorithm HyLinUCB\texttt{HyLinUCB}HyLinUCB that crucially modifies LinUCB\texttt{LinUCB}LinUCB (using a new exploration coefficient) to account for sparsity in the hybrid setting. Under the same diversity assumptions, we prove that HyLinUCB\texttt{HyLinUCB}HyLinUCB also incurs only O(T)O(\sqrt{T})O(T​) regret for TTT rounds. We perform extensive experiments on synthetic and real-world datasets demonstrating strong empirical performance of HyLinUCB\texttt{HyLinUCB}HyLinUCB.For number of arm specific parameters much larger than the number of shared parameters, we observe that DisLinUCB\texttt{DisLinUCB}DisLinUCB incurs the lowest regret. In this case, regret of HyLinUCB\texttt{HyLinUCB}HyLinUCB is the second best and extremely competitive to DisLinUCB\texttt{DisLinUCB}DisLinUCB. In all other situations, including our real-world dataset, HyLinUCB\texttt{HyLinUCB}HyLinUCB has significantly lower regret than LinUCB\texttt{LinUCB}LinUCB, DisLinUCB\texttt{DisLinUCB}DisLinUCB and other SOTA baselines we considered. We also empirically observe that the regret of HyLinUCB\texttt{HyLinUCB}HyLinUCB grows much slower with the number of arms compared to baselines, making it suitable even for very large action spaces.

View on arXiv
Comments on this paper