Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.08770
Cited By
v1
v2
v3 (latest)
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
17 February 2025
Hikaru Umeda
Hideaki Iiduka
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent"
27 / 27 papers shown
Convergence Analysis of SGD under Expected Smoothness
Yuta Kawamoto
Hideaki Iiduka
186
0
0
23 Oct 2025
Tri-Accel: Curvature-Aware Precision-Adaptive and Memory-Elastic Optimization for Efficient GPU Usage
Mohsen Sheibanian
Pouya Shaeri
Alimohammad Beigi
Ryan T. Woo
Aryan Keluskar
265
0
0
23 Aug 2025
Explainable Learning Rate Regimes for Stochastic Optimization
Zhuang Yang
133
0
0
19 Aug 2025
Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis
Yuichi Kondo
Hideaki Iiduka
118
0
0
05 Aug 2025
Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size
Kento Imaizumi
Hideaki Iiduka
277
0
0
30 Jun 2025
Enlightenment Period Improving DNN Performance
Tiantian Liu
Meng Wan
Meng Wan
Jue Wang
275
0
0
02 Apr 2025
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
Keisuke Kamo
Hideaki Iiduka
431
2
0
15 Jan 2025
Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search
Yuki Tsukada
Hideaki Iiduka
362
0
0
25 Jul 2023
On the Convergence of Step Decay Step-Size for Stochastic Optimization
Neural Information Processing Systems (NeurIPS), 2021
Xiaoyu Wang
Sindri Magnússon
M. Johansson
278
31
0
18 Feb 2021
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Nicolas Loizou
Sharan Vaswani
I. Laradji
Damien Scieur
501
225
0
24 Feb 2020
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
532
224
0
09 Feb 2020
On the Variance of the Adaptive Learning Rate and Beyond
International Conference on Learning Representations (ICLR), 2019
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
865
2,190
0
08 Aug 2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Neural Information Processing Systems (NeurIPS), 2019
Guodong Zhang
Lala Li
Zachary Nado
James Martens
Sushant Sachdeva
George E. Dahl
Christopher J. Shallue
Roger C. Grosse
491
181
0
09 Jul 2019
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Neural Information Processing Systems (NeurIPS), 2019
Sharan Vaswani
Aaron Mishkin
I. Laradji
Mark Schmidt
Gauthier Gidel
Damien Scieur
ODL
555
238
0
24 May 2019
Convergence rates for the stochastic gradient descent method for non-convex objective functions
Benjamin J. Fehrman
Benjamin Gess
Arnulf Jentzen
422
113
0
02 Apr 2019
sharpDARTS: Faster and More Accurate Differentiable Architecture Search
Rumaisa Azeem
Varun Jain
Gregory Hager
OOD
191
70
0
23 Mar 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
875
1,574
0
04 Dec 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Journal of machine learning research (JMLR), 2018
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
712
472
0
08 Nov 2018
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
333
310
0
29 Oct 2018
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
904
1,107
0
01 Nov 2017
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
8.3K
172,602
0
12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
800
4,041
0
08 Jun 2017
Coupling Adaptive Batch Sizes with Learning Rates
Conference on Uncertainty in Artificial Intelligence (UAI), 2016
Lukas Balles
Javier Romero
Philipp Hennig
ODL
370
122
0
15 Dec 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Katharina Eggensperger
ODL
1.2K
10,157
0
13 Aug 2016
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016
Liang-Chieh Chen
George Papandreou
Iasonas Kokkinos
Kevin Patrick Murphy
Alan Yuille
SSeg
1.3K
20,682
0
02 Jun 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
4.2K
225,080
0
10 Dec 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
1.7K
46,398
0
11 Feb 2015
1
Page 1 of 1