A Pontryagin Perspective on Reinforcement Learning

28 May 2024

Abstract

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman's equation from dynamic programming, our work builds on Pontryagin's principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.

View on arXiv

@article{eberhard2025_2405.18100,
  title={ A Pontryagin Perspective on Reinforcement Learning },
  author={ Onno Eberhard and Claire Vernade and Michael Muehlebach },
  journal={arXiv preprint arXiv:2405.18100},
  year={ 2025 }
}

Comments on this paper