ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.07298
29
17

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

14 January 2024
Yue Kang
Cho-Jui Hsieh
T. C. Lee
ArXivPDFHTML
Abstract

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown d1d_1d1​ by d2d_2d2​ matrix Θ∗\Theta^*Θ∗ with rank r≪{d1,d2}r \ll \{d_1, d_2\}r≪{d1​,d2​}, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the O~((d1+d2)MrT)\tilde{O}(\sqrt{(d_1+d_2)MrT})O~((d1​+d2​)MrT​) bound of regret while G-ESTS can achineve the O~((d1+d2)3/2Mr3/2T)\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})O~((d1​+d2​)3/2Mr3/2T​) bound of regret under mild assumption up to logarithm terms, where MMM is some problem dependent value. Under a reasonable assumption that M=O((d1+d2)2)M = O((d_1+d_2)^2)M=O((d1​+d2​)2) in our problem setting, the regret of G-ESTT is consistent with the current best regret of O~((d1+d2)3/2rT/Drr)\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})O~((d1​+d2​)3/2rT​/Drr​)~\citep{lu2021low} (DrrD_{rr}Drr​ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

View on arXiv
Comments on this paper