Estimating Risk and Uncertainty in Deep Reinforcement Learning
We propose a method for disentangling epistemic and aleatoric uncertainties in deep reinforcement learning. Aleatoric uncertainty, or risk, which arises from inherently stochastic environments or agents, must be accounted for in the design of risk-sensitive algorithms. Epistemic uncertainty, which stems from limited data, is important both for risk-sensitivity and for efficient exploration. Our method combines elements from distributional reinforcement learning and approximate Bayesian inference techniques with neural networks, allowing us to disentangle both types of uncertainty on the expected return of a policy. Specifically, the learned return distribution provides the aleatoric uncertainty, and the Bayesian posterior yields the epistemic uncertainty. Although our approach in principle requires a large number of samples from the Bayesian posterior to estimate the epistemic uncertainty, we show that two networks already yield a useful approximation. We perform experiments that illustrate our method and some applications.
View on arXiv