AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

7 February 2024

Aigerim Zhumabayeva

Alexander Gasnikov

Dmitry Kamzolov

Papers citing "AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size"

4 / 4 papers shown

Title
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism Tim Tsz-Kit Lau Weijian Li Chenwei Xu Han Liu Mladen Kolar 52 0 0 30 Dec 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods Tim Tsz-Kit Lau Weijian Li Chenwei Xu Han Liu Mladen Kolar 25 0 0 20 Jun 2024
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms Farshed Abdukhakimov Chulu Xiang Dmitry Kamzolov Robert Mansel Gower Martin Takáč 27 2 0 28 Dec 2023
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 273 2,696 0 15 Sep 2016