Learning Multimodal Transition Dynamics for Model-Based Reinforcement
Learning
- OffRL
In this paper we study how to learn stochastic, multimodal transition dynamics for model-based reinforcement learning (RL) tasks. Stochasticity is a fundamental property of many task environments. However, function approximation based on mean-squared error fails at approximating multimodal stochasticity. In contrast, deep generative models can capture complex high-dimensional outcome distributions. First we discuss why, amongst such models, conditional variational inference (VI) is theoretically most appealing for sample-based planning in model-based RL. Subsequently, we study different VI models and identify their ability to learn complex stochasticity on simulated functions, as well as on a typical RL gridworld with strongly multimodal dynamics. Importantly, our simulations show that the VI network successfully uses stochastic latent network nodes to predict multimodal outcomes, but also robustly ignores these for deterministic parts of the transition dynamics. In summary, we show a robust method to learn multimodal transitions using function approximation, which is a key preliminary for model-based RL in stochastic domains.
View on arXiv