ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.00920
  4. Cited By
Demystifying SGD with Doubly Stochastic Gradients
v1v2 (latest)

Demystifying SGD with Doubly Stochastic Gradients

3 June 2024
Kyurae Kim
Joohwan Ko
Yian Ma
Jacob R. Gardner
ArXiv (abs)PDFHTML

Papers citing "Demystifying SGD with Doubly Stochastic Gradients"

41 / 41 papers shown
Title
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
190
2
0
25 May 2025
Provable convergence guarantees for black-box variational inference
Provable convergence guarantees for black-box variational inference
Justin Domke
Guillaume Garrigos
Robert Mansel Gower
95
21
0
04 Jun 2023
On the Convergence of Black-Box Variational Inference
On the Convergence of Black-Box Variational Inference
Kyurae Kim
Jisu Oh
Kaiwen Wu
Yi-An Ma
Jacob R. Gardner
BDL
94
17
0
24 May 2023
Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
Jaeyoung Cha
Jaewook Lee
Chulhee Yun
79
24
0
13 Mar 2023
Explicit Regularization in Overparametrized Models via Noise Injection
Explicit Regularization in Overparametrized Models via Noise Injection
Antonio Orvieto
Anant Raj
Hans Kersting
Francis R. Bach
73
27
0
09 Jun 2022
Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned
  Problems
Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems
Itay Safran
Ohad Shamir
81
19
0
12 Jun 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix
  Factorization
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
65
13
0
24 Feb 2021
Variance-Reduced Methods for Machine Learning
Variance-Reduced Methods for Machine Learning
Robert Mansel Gower
Mark Schmidt
Francis R. Bach
Peter Richtárik
109
117
0
02 Oct 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
937
18,496
0
19 Jun 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and
  Interpolation
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
125
76
0
18 Jun 2020
SGD with shuffling: optimal rates without component convexity and large
  epoch requirements
SGD with shuffling: optimal rates without component convexity and large epoch requirements
Kwangjun Ahn
Chulhee Yun
S. Sra
70
67
0
12 Jun 2020
Random Reshuffling: Simple Analysis with Vast Improvements
Random Reshuffling: Simple Analysis with Vast Improvements
Konstantin Mishchenko
Ahmed Khaled
Peter Richtárik
120
135
0
10 Jun 2020
A Unified Convergence Analysis for Shuffling-Type Gradient Methods
A Unified Convergence Analysis for Shuffling-Type Gradient Methods
Lam M. Nguyen
Quoc Tran-Dinh
Dzung Phan
Phuong Ha Nguyen
Marten van Dijk
104
79
0
19 Feb 2020
Decision-Making with Auto-Encoding Variational Bayes
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
670
10,591
0
17 Feb 2020
Better Theory for SGD in the Nonconvex World
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
105
187
0
09 Feb 2020
How Good is SGD with Random Shuffling?
How Good is SGD with Random Shuffling?
Itay Safran
Ohad Shamir
110
82
0
31 Jul 2019
Generative Modeling by Estimating Gradients of the Data Distribution
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Stefano Ermon
SyDaDiffM
264
3,968
0
12 Jul 2019
Unified Optimal Analysis of the (Stochastic) Gradient Method
Unified Optimal Analysis of the (Stochastic) Gradient Method
Sebastian U. Stich
77
113
0
09 Jul 2019
Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond
Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond
Oliver Hinder
Aaron Sidford
N. Sohoni
81
72
0
27 Jun 2019
Monte Carlo Gradient Estimation in Machine Learning
Monte Carlo Gradient Estimation in Machine Learning
S. Mohamed
Mihaela Rosca
Michael Figurnov
A. Mnih
91
416
0
25 Jun 2019
Provable Gradient Variance Guarantees for Black-Box Variational
  Inference
Provable Gradient Variance Guarantees for Black-Box Variational Inference
Justin Domke
DRL
59
23
0
19 Jun 2019
A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and
  Coordinate Descent
A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent
Eduard A. Gorbunov
Filip Hanzely
Peter Richtárik
111
147
0
27 May 2019
SGD: General Analysis and Improved Rates
SGD: General Analysis and Improved Rates
Robert Mansel Gower
Nicolas Loizou
Xun Qian
Alibek Sailanbayev
Egor Shulgin
Peter Richtárik
97
383
0
27 Jan 2019
Estimate Sequences for Stochastic Composite Optimization: Variance
  Reduction, Acceleration, and Robustness to Noise
Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise
A. Kulunchakov
Julien Mairal
88
45
0
25 Jan 2019
Provable Smoothness Guarantees for Black-Box Variational Inference
Provable Smoothness Guarantees for Black-Box Variational Inference
Justin Domke
74
36
0
24 Jan 2019
Fast and Faster Convergence of SGD for Over-Parameterized Models and an
  Accelerated Perceptron
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Sharan Vaswani
Francis R. Bach
Mark Schmidt
116
301
0
16 Oct 2018
Variance reduction properties of the reparameterization trick
Variance reduction properties of the reparameterization trick
Ming Xu
M. Quiroz
Robert Kohn
Scott A. Sisson
AAML
103
69
0
27 Sep 2018
Lightweight Stochastic Optimization for Minimizing Finite Sums with
  Infinite Data
Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data
Shuai Zheng
James T. Kwok
51
9
0
08 Jun 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
104
228
0
11 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
117
291
0
18 Dec 2017
Stochastic Optimization with Variance Reduction for Infinite Datasets
  with Finite-Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
A. Bietti
Julien Mairal
207
36
0
04 Oct 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
342
1,222
0
16 Aug 2016
Optimization Methods for Large-Scale Machine Learning
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
291
3,228
0
15 Jun 2016
Automatic Differentiation Variational Inference
Automatic Differentiation Variational Inference
A. Kucukelbir
Dustin Tran
Rajesh Ranganath
Andrew Gelman
David M. Blei
133
719
0
02 Mar 2016
Variational Dropout and the Local Reparameterization Trick
Variational Dropout and the Local Reparameterization Trick
Diederik P. Kingma
Tim Salimans
Max Welling
BDL
240
1,518
0
08 Jun 2015
Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients
Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients
Bo Xie
Yingyu Liang
Le Song
106
43
0
14 Apr 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDaDiffM
343
7,048
0
12 Mar 2015
Scalable Kernel Methods via Doubly Stochastic Gradients
Scalable Kernel Methods via Doubly Stochastic Gradients
Bo Dai
Bo Xie
Niao He
Yingyu Liang
Anant Raj
Maria-Florina Balcan
Le Song
184
230
0
21 Jul 2014
Black Box Variational Inference
Black Box Variational Inference
Rajesh Ranganath
S. Gerrish
David M. Blei
DRLBDL
184
1,167
0
31 Dec 2013
Parallel Coordinate Descent Methods for Big Data Optimization
Parallel Coordinate Descent Methods for Big Data Optimization
Peter Richtárik
Martin Takáč
134
487
0
04 Dec 2012
Randomized Smoothing for Stochastic Optimization
Randomized Smoothing for Stochastic Optimization
John C. Duchi
Peter L. Bartlett
Martin J. Wainwright
116
288
0
22 Mar 2011
1