ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.14222
29
0

Online Subset Selection using ααα-Core with no Augmented Regret

28 September 2022
Sourav Sahoo
Siddhant Chaudhary
S. Mukhopadhyay
Abhishek Sinha
    OffRL
ArXivPDFHTML
Abstract

We revisit the classic problem of optimal subset selection in the online learning set-up. Assume that the set [N][N][N] consists of NNN distinct elements. On the tttth round, an adversary chooses a monotone reward function ft:2[N]→R+f_t: 2^{[N]} \to \mathbb{R}_+ft​:2[N]→R+​ that assigns a non-negative reward to each subset of [N].[N].[N]. An online policy selects (perhaps randomly) a subset St⊆[N]S_t \subseteq [N]St​⊆[N] consisting of kkk elements before the reward function ftf_tft​ for the tttth round is revealed to the learner. As a consequence of its choice, the policy receives a reward of ft(St)f_t(S_t)ft​(St​) on the tttth round. Our goal is to design an online sequential subset selection policy to maximize the expected cumulative reward accumulated over a time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new polyhedral characterization of the reward functions called α\alphaα-Core - a generalization of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called α\alphaα-augmented regret. In this new metric, the performance of the online policy is compared with an unrestricted offline benchmark that can select all NNN elements at every round. We show that a large class of reward functions, including submodular, can be efficiently optimized with the SCore policy. We also extend the proposed policy to the optimistic learning set-up where the learner has access to additional untrusted hints regarding the reward functions. Finally, we conclude the paper with a list of open problems.

View on arXiv
Comments on this paper