Real-Time Recurrent Reinforcement Learning

We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.
View on arXiv@article{lemmel2025_2311.04830, title={ Real-Time Recurrent Reinforcement Learning }, author={ Julian Lemmel and Radu Grosu }, journal={arXiv preprint arXiv:2311.04830}, year={ 2025 } }