355

Improving Controller Generalization with Dimensionless Markov Decision Processes

Main:8 Pages
7 Figures
Bibliography:3 Pages
2 Tables
Abstract

Controllers trained with Reinforcement Learning tend to be very specialized and thus generalize poorly when their testing environment differs from their training one. We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space. To do so, we introduce the Dimensionless Markov Decision Process (Π\Pi-MDP): an extension of Contextual-MDPs in which state and action spaces are non-dimensionalized with the Buckingham-Π\Pi theorem. This procedure induces policies that are equivariant with respect to changes in the context of the underlying dynamics. We provide a generic framework for this approach and apply it to a model-based policy search algorithm using Gaussian Process models. We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.

View on arXiv
Comments on this paper