Navigating to the Best Policy in Markov Decision Processes

5 June 2021

Papers citing "Navigating to the Best Policy in Markov Decision Processes"

3 / 3 papers shown

Title
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition Zhong Zheng Haochen Zhang Lingzhou Xue OffRL 70 2 0 10 Oct 2024
Best Policy Identification in Linear MDPs Jerome Taupin Yassir Jedra Alexandre Proutière 33 3 0 11 Aug 2022
Gap-Dependent Unsupervised Exploration for Reinforcement Learning Jingfeng Wu Vladimir Braverman Lin F. Yang 15 12 0 11 Aug 2021