On Optimal Control and Expectation-Maximisation: Theory and an Outlook Towards Algorithms

6 May 2022

Abstract

In this work we demonstrate how both the Stochastic and Risk Sensitive Optimal Control problem can be treated by means of the Expectation-Maximisation algorithm. We show how such a treatment materialises into two separate iterative programs that each generate a unique but closely related sequence of density functions. We motivate to interpret these density functions as beliefs, ergo as probabilistic proxies for the deterministic optimal policy. More formally two fixed point iteration schemes are derived with the stationary point coinciding with the deterministic optimal policies on behalf of the proven convergence of Expectation-Maximisation methods. We are inclined to point out our results are intimately related with the paradigm of Control as Inference. Control as inference here refers to a collection of approaches which aim is also to recast optimal control as an instance of probabilistic inference. Although said paradigm already resulted in the development of several powerful Reinforcement Learning algorithms, the fundamental problem statement usually is introduced by teleological arguments. We argue that the present results demonstrate that earlier established Control as Inference frameworks in fact isolate a single step from either of the proposed iterative programs. In any case the present treatment provides them with a deontological argument of validity. By exposing the underlying technical mechanism we aim to contribute to the general acceptance of Control as Inference as a framework superseding the present Optimal Control paradigm. In order to motivate the general relevance of the presented treatment we further discuss parallels with Path Integral Control and other areas of research before sketching the outlines of future algorithmic development.

View on arXiv

Comments on this paper