Intelligent Trainer for Model-Based Reinforcement Learning

24 May 2018

Yuanlong Li

Abstract

In this paper, we present a framework of a model-based deep reinforcement learning (DRL) algorithm to address the high cost issue associated with the required large amount of sampling data and the tuning process in training and learning, so as to make DRL feasible in practical applications. The basic unit of the framework is the model-based RL training process environment (TPE), in which a target controller communicates with the physical data and cyber data (generated by the model emulator) via state, action, and reward parameters for learning and training. On top of the TPE, we design an RL intelligent trainer to optimize the training of target controller in an online manner. This design decouples the cyber-model related settings from the training algorithms of the target controller, thus provides flexibility to implement different trainer designs. The entity of an intelligent trainer and a TPE is termed as single-head trainer, whose controller could be sensitive to cyber data quality and the action correlation could lead to performance degradation. To solve these problems and to enhance the effectiveness of the DRL algorithm, we develop an ensemble trainer that consists of multi-single-head trainers and is incorporated with memory sharing, reference sampling, and weight transfer. We evaluated the proposed single-head trainer and ensemble trainer for five different tasks of OpenAI gym. The test results show that the proposed trainer method has a competitive performance with low cost, robustness, and auto-tuning. The proposed trainer framework can be extended to include more control actions with more sophisticated trainer design.

View on arXiv

Comments on this paper