414

Regularization Matters in Policy Optimization

Abstract

Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., L2L_2 regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment, and because the deep RL community focuses more on high-level algorithm designs. In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. Interestingly, we find conventional regularization techniques on the policy networks can often bring large improvement, especially on harder tasks. We also compare with the widely used entropy regularization and find L2L_2 regularization is generally better. Our findings are further shown to be robust against training hyperparameters variations. We further study regularizing different components and find that only regularizing the policy network is typically the best. We hope our study provides guidance for future practices in regularizing policy optimization algorithms.

View on arXiv
Comments on this paper