Safety-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

IEEE Transactions on Automatic Control (IEEE TAC), 2021

7 September 2021

Abstract

Reinforcement learning (RL) is a promising approach. However, success is limited towards real-world applications, because ensuring safe exploration and facilitating adequate exploitation is a challenge for controlling robotic systems with unknown models and measurement uncertainties. The learning problem becomes even more difficult for complex tasks over continuous state-space and action-space. In this paper, we propose a learning-based control framework consisting of several aspects: (1) we leverage Linear Temporal Logic (LTL) to express complex tasks over an infinite horizons that are translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agents with the formal guarantee that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture exploiting the benefits of the automaton structure to decompose overall tasks and enhance the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) for systems with high-order relative degrees. In addition, we utilize the properties of LTL automata and ECBFs to develop a guiding process to further improve the efficiency of exploration. Finally, we demonstrate the effectiveness of the framework via several robotic environments. We show an ECBF-based modular deep RL algorithm that achieves near-perfect success rates and safety guarding with high probability confidence during training.

View on arXiv

Comments on this paper