Bayesian Reinforcement Learning: Real-world learning faster than simulations

21 July 2020

Abstract

Deep Reinforcement Learning (DRL) experiments are commonly performed in simulated environments due to the tremendous training sample demands from deep neural networks. In contrast, model-based Bayesian learning allows a robot to learn good policies within a few trials in the real world. Although methods such as Deep PILCO have been applied on many single-robot tasks, here we propose an application of Deep PILCO on finding optimal solutions to the problem of winning a multi-robot combat game. We compare the deep Bayesian learning algorithm with a model-free Deep RL algorithm, Deep Q-Learning, by analyzing the results collected from simulations and real-world experiments. In this game, the RL algorithms' inputs are noisy and unstable due to the filtered LiDAR sensory signal. Surprisingly, our experiments show that the sample-efficient Deep Bayesian RL performance is better than DRL even when comparing the results of a real-world Deep Bayesian RL to those of a simulation-based Deep Q-Learning. Our results point to the advantage of bypassing the reality gap when learning in the real-world with faster learning rates than simulations.

View on arXiv

Comments on this paper