ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.21260
29
0

Bellman Unbiasedness: Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

31 July 2024
Taehyun Cho
Seung Han
Kyungjae Lee
Seokhun Ju
Dohyeong Kim
Jungwoo Lee
ArXivPDFHTML
Abstract

Distributional reinforcement learning improves performance by capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In addition, the intractable element of the infinite dimensionality of distributions has been overlooked. In this paper, we present a regret analysis of distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a key notion of Bellman unbiasedness\textit{Bellman unbiasedness}Bellman unbiasedness which is essential for exactly learnable and provably efficient distributional updates in an online manner. Among all types of statistical functionals for representing infinite-dimensional return distributions, our theoretical results demonstrate that only moment functionals can exactly capture the statistical information. Secondly, we propose a provably efficient algorithm, SF-LSVI\texttt{SF-LSVI}SF-LSVI, that achieves a tight regret bound of O~(dEH32K)\tilde{O}(d_E H^{\frac{3}{2}}\sqrt{K})O~(dE​H23​K​) where HHH is the horizon, KKK is the number of episodes, and dEd_EdE​ is the eluder dimension of a function class.

View on arXiv
@article{cho2025_2407.21260,
  title={ Bellman Unbiasedness: Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation },
  author={ Taehyun Cho and Seungyub Han and Kyungjae Lee and Seokhun Ju and Dohyeong Kim and Jungwoo Lee },
  journal={arXiv preprint arXiv:2407.21260},
  year={ 2025 }
}
Comments on this paper