Bayesian Reinforcement Learning: Real-world learning faster than
simulations
- OffRL
Deep Reinforcement Learning (DRL) experiments are commonly performed in simulated environments due to the tremendous training sample demands from deep neural networks. In contrast, model-based Bayesian learning allows a robot to learn good policies within a few trials in the real world. Although methods such as Deep PILCO have been applied on many single-robot tasks, here we propose an application of Deep PILCO on finding optimal solutions to the problem of winning a multi-robot combat game. We compare the deep Bayesian learning algorithm with a model-free Deep RL algorithm, Deep Q-Learning, by analyzing the results collected from simulations and real-world experiments. In this game, the RL algorithms' inputs are noisy and unstable due to the filtered LiDAR sensory signal. Surprisingly, our experiments show that the sample-efficient Deep Bayesian RL performance is better than DRL even when comparing the results of a real-world Deep Bayesian RL to those of a simulation-based Deep Q-Learning. Our results point to the advantage of bypassing the reality gap when learning in the real-world with faster learning rates than simulations.
View on arXiv