The Convergence of Sparsified Gradient MethodsNeural Information Processing Systems (NeurIPS), 2018 |
Gradient Sparsification for Communication-Efficient Distributed
OptimizationNeural Information Processing Systems (NeurIPS), 2017 |
Adam: A Method for Stochastic OptimizationInternational Conference on Learning Representations (ICLR), 2014 Diederik P. Kingma Jimmy Ba |
Making Gradient Descent Optimal for Strongly Convex Stochastic
OptimizationInternational Conference on Machine Learning (ICML), 2011 |