v1v2 (latest)

Demystifying SGD with Doubly Stochastic Gradients

3 June 2024

Papers citing "Demystifying SGD with Doubly Stochastic Gradients"

41 / 41 papers shown

Title
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models Fengqi Zhu Rongzhen Wang Shen Nie Xiaolu Zhang Chunwei Wu ... Jun Zhou Jianfei Chen Yankai Lin Ji-Rong Wen Chongxuan Li 241 23 0 25 May 2025
Provable convergence guarantees for black-box variational inference Justin Domke Guillaume Garrigos Robert Mansel Gower 150 21 0 04 Jun 2023
On the Convergence of Black-Box Variational Inference Kyurae Kim Jisu Oh Kaiwen Wu Yi-An Ma Jacob R. Gardner BDL 132 17 0 24 May 2023
Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond Jaeyoung Cha Jaewook Lee Chulhee Yun 144 26 0 13 Mar 2023
Explicit Regularization in Overparametrized Models via Noise Injection Antonio Orvieto Anant Raj Hans Kersting Francis R. Bach 104 29 0 09 Jun 2022
Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems Itay Safran Ohad Shamir 112 20 0 12 Jun 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization Tianyi Liu Yan Li S. Wei Enlu Zhou T. Zhao 93 16 0 24 Feb 2021
Variance-Reduced Methods for Machine Learning Robert Mansel Gower Mark Schmidt Francis R. Bach Peter Richtárik 149 130 0 02 Oct 2020
Denoising Diffusion Probabilistic Models Jonathan Ho Ajay Jain Pieter Abbeel DiffM 2.0K 21,059 0 19 Jun 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation Robert Mansel Gower Othmane Sebbouh Nicolas Loizou 203 79 0 18 Jun 2020
SGD with shuffling: optimal rates without component convexity and large epoch requirements Kwangjun Ahn Chulhee Yun S. Sra 150 68 0 12 Jun 2020
Random Reshuffling: Simple Analysis with Vast Improvements Konstantin Mishchenko Ahmed Khaled Peter Richtárik 205 140 0 10 Jun 2020
A Unified Convergence Analysis for Shuffling-Type Gradient Methods Lam M. Nguyen Quoc Tran-Dinh Dzung Phan Phuong Ha Nguyen Marten van Dijk 153 84 0 19 Feb 2020
Decision-Making with Auto-Encoding Variational Bayes Romain Lopez Pierre Boyeau Nir Yosef Michael I. Jordan Jeffrey Regier BDL 1.0K 10,591 0 17 Feb 2020
Better Theory for SGD in the Nonconvex World Ahmed Khaled Peter Richtárik 195 194 0 09 Feb 2020
How Good is SGD with Random Shuffling? Itay Safran Ohad Shamir 209 86 0 31 Jul 2019
Generative Modeling by Estimating Gradients of the Data Distribution Yang Song Stefano Ermon SyDa DiffM 487 4,266 0 12 Jul 2019
Unified Optimal Analysis of the (Stochastic) Gradient Method Sebastian U. Stich 150 118 0 09 Jul 2019
Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond Oliver Hinder Aaron Sidford N. Sohoni 175 76 0 27 Jun 2019
Monte Carlo Gradient Estimation in Machine Learning S. Mohamed Mihaela Rosca Michael Figurnov A. Mnih 165 439 0 25 Jun 2019
Provable Gradient Variance Guarantees for Black-Box Variational Inference Justin Domke DRL 81 23 0 19 Jun 2019
A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent Eduard A. Gorbunov Filip Hanzely Peter Richtárik 159 149 0 27 May 2019
SGD: General Analysis and Improved Rates Robert Mansel Gower Nicolas Loizou Xun Qian Alibek Sailanbayev Egor Shulgin Peter Richtárik 161 402 0 27 Jan 2019
Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise A. Kulunchakov Julien Mairal 201 45 0 25 Jan 2019
Provable Smoothness Guarantees for Black-Box Variational Inference Justin Domke 108 36 0 24 Jan 2019
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron Sharan Vaswani Francis R. Bach Mark Schmidt 205 305 0 16 Oct 2018
Variance reduction properties of the reparameterization trick Ming Xu M. Quiroz Robert Kohn Scott A. Sisson AAML 145 71 0 27 Sep 2018
Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data Shuai Zheng James T. Kwok 89 9 0 08 Jun 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption Lam M. Nguyen Phuong Ha Nguyen Marten van Dijk Peter Richtárik K. Scheinberg Martin Takáč 155 235 0 11 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning Siyuan Ma Raef Bassily M. Belkin 175 299 0 18 Dec 2017
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure A. Bietti Julien Mairal 282 36 0 04 Oct 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark Schmidt 620 1,283 0 16 Aug 2016
Optimization Methods for Large-Scale Machine Learning Léon Bottou Frank E. Curtis J. Nocedal 619 3,325 0 15 Jun 2016
Automatic Differentiation Variational Inference A. Kucukelbir Dustin Tran Rajesh Ranganath Andrew Gelman David M. Blei 187 732 0 02 Mar 2016
Variational Dropout and the Local Reparameterization Trick Diederik P. Kingma Tim Salimans Max Welling BDL 428 1,554 0 08 Jun 2015
Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients Bo Xie Yingyu Liang Le Song 220 44 0 14 Apr 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics Jascha Narain Sohl-Dickstein Eric A. Weiss Niru Maheswaranathan Surya Ganguli SyDa DiffM 855 7,671 0 12 Mar 2015
Scalable Kernel Methods via Doubly Stochastic Gradients Bo Dai Bo Xie Niao He Yingyu Liang Anant Raj Maria-Florina Balcan Le Song 310 230 0 21 Jul 2014
Black Box Variational Inference Rajesh Ranganath S. Gerrish David M. Blei DRL BDL 293 1,184 0 31 Dec 2013
Parallel Coordinate Descent Methods for Big Data Optimization Peter Richtárik Martin Takáč 211 487 0 04 Dec 2012
Randomized Smoothing for Stochastic Optimization John C. Duchi Peter L. Bartlett Martin J. Wainwright 219 293 0 22 Mar 2011