ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02470
126
60
v1v2 (latest)

Bilinear Bandits with Low-rank Structure

8 January 2019
Kwang-Sung Jun
Rebecca Willett
S. Wright
Robert D. Nowak
ArXiv (abs)PDFHTML
Abstract

We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The unknown in the problem is a d1d_1d1​ by d2d_2d2​ matrix Θ∗\mathbf{\Theta}^*Θ∗ that defines the reward, and has low rank r≪min⁡{d1,d2}r \ll \min\{d_1,d_2\}r≪min{d1​,d2​}. Determination of Θ∗\mathbf{\Theta}^*Θ∗ with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called "Explore-Subspace-Then-Refine" (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called "almost-low-dimensional OFUL" (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is O~((d1+d2)3/2rT)\widetilde{\mathcal{O}}((d_1+d_2)^{3/2} \sqrt{r T})O((d1​+d2​)3/2rT​) where O~\widetilde{\mathcal{O}}O hides logarithmic factors and TTT is the time horizon, which improves upon the regret of O~(d1d2T)\widetilde{\mathcal{O}}(d_1d_2\sqrt{T})O(d1​d2​T​) attained for a na\"ive linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a na\"ive linear bandit reduction.

View on arXiv
Comments on this paper