1.3K

New perspectives on the natural gradient method

Journal of machine learning research (JMLR), 2014
Abstract

In this manuscript we review and discuss some theoretical aspects of Amari's natural gradient method, provide a unifying picture of the many different versions of it which have appeared over the years, and offer some new insights and perspectives regarding the method and its relationship to other optimization methods. Among our various contributions is the identification of a general condition under which the Fisher information matrix and Schraudolph's generalized Gauss-Newton matrix are equivalent. This equivalence implies that optimization methods which use the latter matrix, such as the Hessian-free optimization approach of Martens, are actually natural gradient methods in disguise. It also lets us view natural gradient methods as approximate Newton methods, justifying the application of various "update damping" techniques to them, which are designed to compensate for break-downs in local quadratic approximations. Additionally, we analyze the parameterization invariance possessed by the natural gradient method in the idealized setting of infinitesimally small update steps, and consider the extent to which it holds for practical versions of the method which take large discrete steps. We go on to show that parameterization invariance is not possessed by the classical Newton-Raphson method (even in the idealized setting), and then give a general characterization of gradient-based methods which do possess it.

View on arXiv
Comments on this paper