23
1

Higher Order Generalization Error for First Order Discretization of Langevin Diffusion

Abstract

We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an ϵ\epsilon tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run Ω(ϵ1log(ϵ1))\Omega(\epsilon^{-1} \log (\epsilon^{-1}) ) iterations with Ω(ϵ1)\Omega(\epsilon^{-1}) samples. In this article, we show that with additional smoothness assumptions, even first order methods can achieve arbitrarily runtime complexity. More precisely, for each N>0N>0, we provide a sufficient smoothness condition on the loss function such that a first order discretization can reach ϵ\epsilon expected generalization error given Ω(ϵ1/Nlog(ϵ1))\Omega( \epsilon^{-1/N} \log (\epsilon^{-1}) ) iterations with Ω(ϵ1)\Omega(\epsilon^{-1}) samples.

View on arXiv
Comments on this paper