Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2406.00920
Cited By
v1
v2 (latest)
Demystifying SGD with Doubly Stochastic Gradients
3 June 2024
Kyurae Kim
Joohwan Ko
Yian Ma
Jacob R. Gardner
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Demystifying SGD with Doubly Stochastic Gradients"
41 / 41 papers shown
Title
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
241
23
0
25 May 2025
Provable convergence guarantees for black-box variational inference
Justin Domke
Guillaume Garrigos
Robert Mansel Gower
150
21
0
04 Jun 2023
On the Convergence of Black-Box Variational Inference
Kyurae Kim
Jisu Oh
Kaiwen Wu
Yi-An Ma
Jacob R. Gardner
BDL
132
17
0
24 May 2023
Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
Jaeyoung Cha
Jaewook Lee
Chulhee Yun
144
26
0
13 Mar 2023
Explicit Regularization in Overparametrized Models via Noise Injection
Antonio Orvieto
Anant Raj
Hans Kersting
Francis R. Bach
104
29
0
09 Jun 2022
Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems
Itay Safran
Ohad Shamir
112
20
0
12 Jun 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
93
16
0
24 Feb 2021
Variance-Reduced Methods for Machine Learning
Robert Mansel Gower
Mark Schmidt
Francis R. Bach
Peter Richtárik
149
130
0
02 Oct 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
2.0K
21,059
0
19 Jun 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
203
79
0
18 Jun 2020
SGD with shuffling: optimal rates without component convexity and large epoch requirements
Kwangjun Ahn
Chulhee Yun
S. Sra
150
68
0
12 Jun 2020
Random Reshuffling: Simple Analysis with Vast Improvements
Konstantin Mishchenko
Ahmed Khaled
Peter Richtárik
205
140
0
10 Jun 2020
A Unified Convergence Analysis for Shuffling-Type Gradient Methods
Lam M. Nguyen
Quoc Tran-Dinh
Dzung Phan
Phuong Ha Nguyen
Marten van Dijk
153
84
0
19 Feb 2020
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
1.0K
10,591
0
17 Feb 2020
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
195
194
0
09 Feb 2020
How Good is SGD with Random Shuffling?
Itay Safran
Ohad Shamir
209
86
0
31 Jul 2019
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Stefano Ermon
SyDa
DiffM
487
4,266
0
12 Jul 2019
Unified Optimal Analysis of the (Stochastic) Gradient Method
Sebastian U. Stich
150
118
0
09 Jul 2019
Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond
Oliver Hinder
Aaron Sidford
N. Sohoni
175
76
0
27 Jun 2019
Monte Carlo Gradient Estimation in Machine Learning
S. Mohamed
Mihaela Rosca
Michael Figurnov
A. Mnih
165
439
0
25 Jun 2019
Provable Gradient Variance Guarantees for Black-Box Variational Inference
Justin Domke
DRL
81
23
0
19 Jun 2019
A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent
Eduard A. Gorbunov
Filip Hanzely
Peter Richtárik
159
149
0
27 May 2019
SGD: General Analysis and Improved Rates
Robert Mansel Gower
Nicolas Loizou
Xun Qian
Alibek Sailanbayev
Egor Shulgin
Peter Richtárik
161
402
0
27 Jan 2019
Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise
A. Kulunchakov
Julien Mairal
201
45
0
25 Jan 2019
Provable Smoothness Guarantees for Black-Box Variational Inference
Justin Domke
108
36
0
24 Jan 2019
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Sharan Vaswani
Francis R. Bach
Mark Schmidt
205
305
0
16 Oct 2018
Variance reduction properties of the reparameterization trick
Ming Xu
M. Quiroz
Robert Kohn
Scott A. Sisson
AAML
145
71
0
27 Sep 2018
Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data
Shuai Zheng
James T. Kwok
89
9
0
08 Jun 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
155
235
0
11 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
175
299
0
18 Dec 2017
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
A. Bietti
Julien Mairal
282
36
0
04 Oct 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
620
1,283
0
16 Aug 2016
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
619
3,325
0
15 Jun 2016
Automatic Differentiation Variational Inference
A. Kucukelbir
Dustin Tran
Rajesh Ranganath
Andrew Gelman
David M. Blei
187
732
0
02 Mar 2016
Variational Dropout and the Local Reparameterization Trick
Diederik P. Kingma
Tim Salimans
Max Welling
BDL
428
1,554
0
08 Jun 2015
Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients
Bo Xie
Yingyu Liang
Le Song
220
44
0
14 Apr 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDa
DiffM
855
7,671
0
12 Mar 2015
Scalable Kernel Methods via Doubly Stochastic Gradients
Bo Dai
Bo Xie
Niao He
Yingyu Liang
Anant Raj
Maria-Florina Balcan
Le Song
310
230
0
21 Jul 2014
Black Box Variational Inference
Rajesh Ranganath
S. Gerrish
David M. Blei
DRL
BDL
293
1,184
0
31 Dec 2013
Parallel Coordinate Descent Methods for Big Data Optimization
Peter Richtárik
Martin Takáč
211
487
0
04 Dec 2012
Randomized Smoothing for Stochastic Optimization
John C. Duchi
Peter L. Bartlett
Martin J. Wainwright
219
293
0
22 Mar 2011
1