ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.02992
19
3

Model Selection in Reinforcement Learning with General Function Approximations

6 July 2022
Avishek Ghosh
Sayak Ray Chowdhury
ArXivPDFHTML
Abstract

We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know the function classes, denoted by F\mathcal{F}F and M\mathcal{M}M, where the true models -- reward generating function for MABs and and transition kernel for MDPs -- lie, respectively. Instead, we are given MMM nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that \emph{adapt} to the smallest function class (among the nested MMM classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., \cF\cF\cF and \cM\cM\cM) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon TTT.

View on arXiv
Comments on this paper