
Parallelizing Stochastic Gradient Descent for Least Squares Regression:
mini-batching, averaging, and model misspecification
Papers citing "Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification"
21 / 21 papers shown
Title |
|---|
![]() On the Theory of Policy Gradient Methods: Optimality, Approximation, and
Distribution ShiftAnnual Conference Computational Learning Theory (COLT), 2019 |
![]() The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning
Rate Procedure For Least SquaresNeural Information Processing Systems (NeurIPS), 2019 |
![]() On the insufficiency of existing momentum schemes for Stochastic
OptimizationInformation Theory and Applications Workshop (ITA), 2018 |
![]() A Markov Chain Theory Approach to Characterizing the Minimax Optimality
of Stochastic Gradient Descent (for Least Squares)Foundations of Software Technology and Theoretical Computer Science (FSTTCS), 2017 |
![]() Online to Offline Conversions, Universality and Adaptive Minibatch SizesNeural Information Processing Systems (NeurIPS), 2017 |
![]() Stochastic Composite Least-Squares Regression with convergence rate
O(1/n)Annual Conference Computational Learning Theory (COLT), 2017 |





















