ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.11797
90
4
v1v2v3v4v5v6 (latest)

Fast and Accurate Repeated Decision Making

28 May 2019
Nicolò Cesa-Bianchi
Tommaso Cesari
Yishay Mansour
Vianney Perchet
ArXiv (abs)PDFHTML
Abstract

We study a setting in which a learner faces a sequence of decision tasks and is required to make good decisions as quickly as possible. Each task nnn is associated with a pair (Xn,μn)(X_n,\mu_n)(Xn​,μn​), where XnX_nXn​ is a random variable and μn\mu_nμn​ is its (unknown and potentially negative) expectation. The learner can draw arbitrarily many i.i.d. samples of XnX_nXn​ but its expectation μn\mu_nμn​ is never revealed. After some sampling is done, the learner can decide to stop and either accept the task, gaining μn\mu_nμn​ as a reward, or reject it, getting zero reward instead. A distinguishing feature of our model is that the learner's performance is measured as the expected cumulative reward divided by the expected cumulative number of drawn samples. The learner's goal is to converge to the per-sample reward of the optimal policy within a fixed class. We design an online algorithm with data-dependent theoretical guarantees for finite sets of policies, and analyze its extension to infinite classes of policies. A key technical aspect of this setting, which sets it aside from stochastic bandits, is the impossibility of obtaining unbiased estimates of the policy's performance objective.

View on arXiv
Comments on this paper