Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.00920
Cited By
v1
v2 (latest)
Demystifying SGD with Doubly Stochastic Gradients
3 June 2024
Kyurae Kim
Joohwan Ko
Yian Ma
Jacob R. Gardner
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Demystifying SGD with Doubly Stochastic Gradients"
41 / 41 papers shown
Title
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
190
2
0
25 May 2025
Provable convergence guarantees for black-box variational inference
Justin Domke
Guillaume Garrigos
Robert Mansel Gower
95
21
0
04 Jun 2023
On the Convergence of Black-Box Variational Inference
Kyurae Kim
Jisu Oh
Kaiwen Wu
Yi-An Ma
Jacob R. Gardner
BDL
94
17
0
24 May 2023
Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
Jaeyoung Cha
Jaewook Lee
Chulhee Yun
79
24
0
13 Mar 2023
Explicit Regularization in Overparametrized Models via Noise Injection
Antonio Orvieto
Anant Raj
Hans Kersting
Francis R. Bach
73
27
0
09 Jun 2022
Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems
Itay Safran
Ohad Shamir
81
19
0
12 Jun 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
65
13
0
24 Feb 2021
Variance-Reduced Methods for Machine Learning
Robert Mansel Gower
Mark Schmidt
Francis R. Bach
Peter Richtárik
109
117
0
02 Oct 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
937
18,496
0
19 Jun 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
125
76
0
18 Jun 2020
SGD with shuffling: optimal rates without component convexity and large epoch requirements
Kwangjun Ahn
Chulhee Yun
S. Sra
70
67
0
12 Jun 2020
Random Reshuffling: Simple Analysis with Vast Improvements
Konstantin Mishchenko
Ahmed Khaled
Peter Richtárik
120
135
0
10 Jun 2020
A Unified Convergence Analysis for Shuffling-Type Gradient Methods
Lam M. Nguyen
Quoc Tran-Dinh
Dzung Phan
Phuong Ha Nguyen
Marten van Dijk
104
79
0
19 Feb 2020
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
670
10,591
0
17 Feb 2020
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
105
187
0
09 Feb 2020
How Good is SGD with Random Shuffling?
Itay Safran
Ohad Shamir
110
82
0
31 Jul 2019
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Stefano Ermon
SyDa
DiffM
264
3,968
0
12 Jul 2019
Unified Optimal Analysis of the (Stochastic) Gradient Method
Sebastian U. Stich
77
113
0
09 Jul 2019
Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond
Oliver Hinder
Aaron Sidford
N. Sohoni
81
72
0
27 Jun 2019
Monte Carlo Gradient Estimation in Machine Learning
S. Mohamed
Mihaela Rosca
Michael Figurnov
A. Mnih
91
416
0
25 Jun 2019
Provable Gradient Variance Guarantees for Black-Box Variational Inference
Justin Domke
DRL
59
23
0
19 Jun 2019
A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent
Eduard A. Gorbunov
Filip Hanzely
Peter Richtárik
111
147
0
27 May 2019
SGD: General Analysis and Improved Rates
Robert Mansel Gower
Nicolas Loizou
Xun Qian
Alibek Sailanbayev
Egor Shulgin
Peter Richtárik
97
383
0
27 Jan 2019
Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise
A. Kulunchakov
Julien Mairal
88
45
0
25 Jan 2019
Provable Smoothness Guarantees for Black-Box Variational Inference
Justin Domke
74
36
0
24 Jan 2019
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Sharan Vaswani
Francis R. Bach
Mark Schmidt
116
301
0
16 Oct 2018
Variance reduction properties of the reparameterization trick
Ming Xu
M. Quiroz
Robert Kohn
Scott A. Sisson
AAML
103
69
0
27 Sep 2018
Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data
Shuai Zheng
James T. Kwok
51
9
0
08 Jun 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
104
228
0
11 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
117
291
0
18 Dec 2017
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
A. Bietti
Julien Mairal
207
36
0
04 Oct 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
342
1,222
0
16 Aug 2016
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
291
3,228
0
15 Jun 2016
Automatic Differentiation Variational Inference
A. Kucukelbir
Dustin Tran
Rajesh Ranganath
Andrew Gelman
David M. Blei
133
719
0
02 Mar 2016
Variational Dropout and the Local Reparameterization Trick
Diederik P. Kingma
Tim Salimans
Max Welling
BDL
240
1,518
0
08 Jun 2015
Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients
Bo Xie
Yingyu Liang
Le Song
106
43
0
14 Apr 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDa
DiffM
343
7,048
0
12 Mar 2015
Scalable Kernel Methods via Doubly Stochastic Gradients
Bo Dai
Bo Xie
Niao He
Yingyu Liang
Anant Raj
Maria-Florina Balcan
Le Song
184
230
0
21 Jul 2014
Black Box Variational Inference
Rajesh Ranganath
S. Gerrish
David M. Blei
DRL
BDL
184
1,167
0
31 Dec 2013
Parallel Coordinate Descent Methods for Big Data Optimization
Peter Richtárik
Martin Takáč
134
487
0
04 Dec 2012
Randomized Smoothing for Stochastic Optimization
John C. Duchi
Peter L. Bartlett
Martin J. Wainwright
116
288
0
22 Mar 2011
1