Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision
Environments
Motivated by neural network training in low-bit floating and fixed-point environments, this work studies the convergence of variants of SGD using adaptive step sizes with computational error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point, and the non-asymptotic convergence to an approximate stationary point are presented assuming that only an approximation of the loss function's stochastic gradient can be computed, as well as error in computing the SGD step itself. Different variants of SGD are tested empirically in a variety of low-precision arithmetic environments, where improved test set accuracy is observed compared to SGD for two image recognition tasks.
View on arXiv