ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.13203
16
22

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

26 February 2023
Shengbo Wang
Nian Si
Jose H. Blanchet
Zhengyuan Zhou
    OOD
    OffRL
ArXivPDFHTML
Abstract

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust QQQ-learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust QQQ-function within an ϵ\epsilonϵ error in the sup norm is upper bounded by O~(∣S∣∣A∣(1−γ)−5ϵ−2p∧−6δ−4)\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})O~(∣S∣∣A∣(1−γ)−5ϵ−2p∧−6​δ−4), where γ\gammaγ is the discount rate, p∧p_{\wedge}p∧​ is the non-zero minimal support probability of the transition kernels and δ\deltaδ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

View on arXiv
Comments on this paper