The relationship between dynamic programming and active inference: the discrete, finite-horizon case

17 September 2020

Lancelot Da Costa

Abstract

Active inference is a normative framework for generating behaviour based upon the free energy principle, a theory of global brain function. This framework has been successfully used to solve reinforcement learning and stochastic control tasks, yet, the formal relation between active inference and reward maximisation has not been fully explicated. In this paper, we consider the relation between active inference and dynamic programming under the Bellman equation, which underlies many approaches to reinforcement learning and control. We show that, on finite-horizon partially observed Markov decision processes, dynamic programming is a limiting case of active inference. In a fully observed environment, active inference agents seek to sample a target distribution encoding preferences. When these target states correspond to rewarding states, this maximises expected reward as in reinforcement learning. When states are partially observed or ambiguous, active inference agents will choose the action that minimises both risk and ambiguity. This allows active inference agents to supplement goal-seeking with exploratory behaviour. This speaks to the unifying potential of active inference, as the objective optimised during action selection subsumes many important quantities used in decision-making in the physical, engineering, and life sciences.

View on arXiv

Comments on this paper