ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.11489
15
10

UCB-based Algorithms for Multinomial Logistic Regression Bandits

21 March 2021
Sanae Amani
Christos Thrampoulidis
ArXivPDFHTML
Abstract

Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click', `show me later', `never show again', `no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of K+1≥2K+1\geq 2K+1≥2 possible outcomes (+1 stands for the `not click' outcome): we assume that for a learner's action xt\mathbf{x}_txt​, the user selects one of K+1≥2K+1\geq 2K+1≥2 outcomes, say outcome iii, with a multinomial logit (MNL) probabilistic model with corresponding unknown parameter θˉ∗i\bar{\boldsymbol\theta}_{\ast i}θˉ∗i​. Each outcome iii is also associated with a revenue parameter ρi\rho_iρi​ and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret O~(dKT)\tilde{\mathcal{O}}(dK\sqrt{T})O~(dKT​) with small dependency on problem-dependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results.

View on arXiv
Comments on this paper