8
1

Real-Time Recurrent Reinforcement Learning

Abstract

We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.

View on arXiv
@article{lemmel2025_2311.04830,
  title={ Real-Time Recurrent Reinforcement Learning },
  author={ Julian Lemmel and Radu Grosu },
  journal={arXiv preprint arXiv:2311.04830},
  year={ 2025 }
}
Comments on this paper