377

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Neural Information Processing Systems (NeurIPS), 2024
Abstract

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown d1d_1 by d2d_2 matrix Θ\Theta^* with rank r{d1,d2}r \ll \{d_1, d_2\}, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the O~((d1+d2)MrT)\tilde{O}(\sqrt{(d_1+d_2)MrT}) bound of regret while G-ESTS can achineve the O~((d1+d2)3/2Mr3/2T)\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T}) bound of regret under mild assumption up to logarithm terms, where MM is some problem dependent value. Under a reasonable assumption that M=O((d1+d2)2)M = O((d_1+d_2)^2) in our problem setting, the regret of G-ESTT is consistent with the current best regret of O~((d1+d2)3/2rT/Drr)\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})~\citep{lu2021low} (DrrD_{rr} will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

View on arXiv
Comments on this paper