ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17835
17
13

Optimal Streaming Algorithms for Multi-Armed Bandits

23 October 2024
Tianyuan Jin
Keke Huang
Jing Tang
Xiaokui Xiao
ArXivPDFHTML
Abstract

This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of nnn arms with reward distributions supported on [0,1][0,1][0,1] with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \eps-toptoptop-kkk arms identification problem, which asks for kkk arms whose reward means are lower than that of the kkk-th best arm by at most \eps\eps\eps with probability at least 1−δ1-\delta1−δ. For general \eps∈(0,1)\eps \in (0,1)\eps∈(0,1), the existing solution for this problem assumes k=1k = 1k=1 and achieves the optimal sample complexity O(n\eps2log⁡1δ)O(\frac{n}{\eps^2} \log \frac{1}{\delta})O(\eps2n​logδ1​) using O(log⁡∗(n))O(\log^*(n))O(log∗(n)) (log⁡∗(n)\log^*(n)log∗(n) equals the number of times that we need to apply the logarithm function on nnn before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any kkk and achieves the optimal sample complexity O(n\eps2log⁡kδ)O(\frac{n}{\eps^2} \log\frac{k}{\delta})O(\eps2n​logδk​) using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least 1−δ1-\delta1−δ probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within O(log⁡Δ2−1)O(\log \Delta_2^{-1})O(logΔ2−1​) passes, where Δ2\Delta_2Δ2​ is the gap between the mean of the best arm and that of the second best arm.

View on arXiv
Comments on this paper