75
v1v2v3 (latest)

Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach

Main:8 Pages
Bibliography:3 Pages
Appendix:8 Pages
Abstract

We address the problem of quantum reinforcement learning (QRL) under model-free settings with quantum oracle access to the Markov Decision Process (MDP). This paper introduces a Quantum Natural Policy Gradient (QNPG) algorithm, which replaces the random sampling used in classical Natural Policy Gradient (NPG) estimators with a deterministic gradient estimation approach, enabling seamless integration into quantum systems. While this modification introduces a bounded bias in the estimator, the bias decays exponentially with increasing truncation levels. This paper demonstrates that the proposed QNPG algorithm achieves a sample complexity of O~(ϵ1.5)\tilde{\mathcal{O}}(\epsilon^{-1.5}) for queries to the quantum oracle, significantly improving the classical lower bound of O~(ϵ2)\tilde{\mathcal{O}}(\epsilon^{-2}) for queries to the MDP.

View on arXiv
Comments on this paper