11
1

U-Clip: On-Average Unbiased Stochastic Gradient Clipping

Bryn Elesedy
Marcus Hutter
Abstract

U-Clip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the U-Clip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates uiu_i as long as i=1t(uigi)=o(t)\sum_{i=1}^t (u_i - g_i) = o(t) where gig_i are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.

View on arXiv
Comments on this paper