ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXivPDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 173 papers shown
Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
7
0
0
19 May 2025
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Yezhen Wang
Zhouhao Yang
Brian K Chen
Fanyi Pu
Bo-wen Li
Tianyu Gao
Kenji Kawaguchi
46
0
0
03 May 2025
Linear Mode Connectivity in Differentiable Tree Ensembles
Linear Mode Connectivity in Differentiable Tree Ensembles
Ryuichi Kanoh
M. Sugiyama
72
1
0
17 Feb 2025
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
Hikaru Umeda
Hideaki Iiduka
67
2
0
17 Feb 2025
On the use of neural networks for the structural characterization of polymeric porous materials
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
46
5
0
25 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
44
4
0
10 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
127
2
0
03 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
200
0
0
30 Dec 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
83
9
0
29 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
61
1
0
26 Aug 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
41
3
0
30 Apr 2024
Principled Architecture-aware Scaling of Hyperparameters
Principled Architecture-aware Scaling of Hyperparameters
Wuyang Chen
Junru Wu
Zhangyang Wang
Boris Hanin
AI4CE
49
0
0
27 Feb 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
46
2
0
07 Feb 2024
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao
Yanshuai Cao
Lili Mou
16
40
0
05 Feb 2024
Refining the ONCE Benchmark with Hyperparameter Tuning
Refining the ONCE Benchmark with Hyperparameter Tuning
Maksim Golyadkin
Alexander Gambashidze
Ildar Nurgaliev
Ilya Makarov
38
1
0
10 Nov 2023
Efficient Convolution and Transformer-Based Network for Video Frame
  Interpolation
Efficient Convolution and Transformer-Based Network for Video Frame Interpolation
Issa Khalifeh
L. Murn
M. Mrak
E. Izquierdo
ViT
36
2
0
12 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
Deep Huber quantile regression networks
Deep Huber quantile regression networks
Hristos Tyralis
Georgia Papacharalampous
N. Dogulu
Kwok-Pan Chun
UQCV
47
1
0
17 Jun 2023
Intelligent gradient amplification for deep neural networks
Intelligent gradient amplification for deep neural networks
S. Basodi
K. Pusuluri
Xueli Xiao
Yi Pan
ODL
21
1
0
29 May 2023
Taming Resource Heterogeneity In Distributed ML Training With Dynamic
  Batching
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
S. Tyagi
Prateek Sharma
24
22
0
20 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer
  neural networks
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
30
14
0
10 May 2023
Do deep neural networks have an inbuilt Occam's razor?
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
21
16
0
13 Apr 2023
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
Kfir Y. Levy
Kfir Y. Levy
FedML
51
2
0
09 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Inductive biases in deep learning models for weather prediction
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
46
6
0
06 Apr 2023
Learning Rate Schedules in the Presence of Distribution Shift
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach
Adel Javanmard
Vahab Mirrokni
Pratik Worah
24
6
0
27 Mar 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
27
9
0
05 Mar 2023
A reinforcement learning path planning approach for range-only
  underwater target localization with autonomous vehicles
A reinforcement learning path planning approach for range-only underwater target localization with autonomous vehicles
Ivan Masmitja
Mario Martin
K. Katija
S. Gomáriz
J. Navarro
24
5
0
17 Jan 2023
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient
  Federated Learning
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient Federated Learning
Young Geun Kim
Carole-Jean Wu
FedML
22
5
0
30 Nov 2022
Aspects of scaling and scalability for flow-based sampling of lattice
  QCD
Aspects of scaling and scalability for flow-based sampling of lattice QCD
Ryan Abbott
M. S. Albergo
Aleksandar Botev
D. Boyda
Kyle Cranmer
...
Ali Razavi
Danilo Jimenez Rezende
F. Romero-López
P. Shanahan
Julian M. Urban
32
33
0
14 Nov 2022
Breadth-First Pipeline Parallelism
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
28
1
0
11 Nov 2022
Adaptive scaling of the learning rate by second order automatic
  differentiation
Adaptive scaling of the learning rate by second order automatic differentiation
F. Gournay
Alban Gossard
ODL
31
1
0
26 Oct 2022
Large Batch and Patch Size Training for Medical Image Segmentation
Large Batch and Patch Size Training for Medical Image Segmentation
Junya Sato
Shoji Kido
19
2
0
24 Oct 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the
  Memory Usage of Neural Networks
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
31
8
0
24 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural
  Networks Trained with Large Batch Sizes
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
32
11
0
21 Oct 2022
Accelerating Transfer Learning with Near-Data Computation on Cloud
  Object Stores
Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores
Arsany Guirguis
Diana Petrescu
Florin Dinu
D. Quoc
Javier Picorel
R. Guerraoui
40
0
0
16 Oct 2022
Characterization of anomalous diffusion through convolutional
  transformers
Characterization of anomalous diffusion through convolutional transformers
Nicolás Firbas
Òscar Garibo i Orts
M. Garcia-March
J. A. Conejero
38
18
0
10 Oct 2022
Why neural networks find simple solutions: the many regularizers of
  geometric complexity
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
57
31
0
27 Sep 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of
  Deep Learning Optimizer using Hyperparameters Close to One
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
38
4
0
21 Aug 2022
Dynamic Batch Adaptation
Dynamic Batch Adaptation
Cristian Simionescu
George Stoica
Robert Herscovici
ODL
27
1
0
01 Aug 2022
Efficient NLP Model Finetuning via Multistage Data Filtering
Efficient NLP Model Finetuning via Multistage Data Filtering
Ouyang Xu
S. Ansari
F. Lin
Yangfeng Ji
35
2
0
28 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
27
8
0
19 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in
  E-commerce
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
32
16
0
01 Jul 2022
On the fast convergence of minibatch heavy ball momentum
On the fast convergence of minibatch heavy ball momentum
Raghu Bollapragada
Tyler Chen
Rachel A. Ward
32
17
0
15 Jun 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and
  Stronger
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu-Xiang Wang
Sheng Zha
George Karypis
27
69
0
14 Jun 2022
MLLess: Achieving Cost Efficiency in Serverless Machine Learning
  Training
MLLess: Achieving Cost Efficiency in Serverless Machine Learning Training
Pablo Gimeno Sarroca
Marc Sánchez Artigas
28
14
0
12 Jun 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for
  environmental sound classification
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
36
9
0
25 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
ES-dRNN with Dynamic Attention for Short-Term Load Forecasting
ES-dRNN with Dynamic Attention for Short-Term Load Forecasting
Slawek Smyl
Grzegorz Dudek
Paweł Pełka
AI4TS
21
11
0
02 Mar 2022
Cyclical Focal Loss
Cyclical Focal Loss
L. Smith
35
14
0
16 Feb 2022
1234
Next