Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.02029
Cited By
v1
v2 (latest)
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
6 December 2017
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks"
50 / 55 papers shown
Title
DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen
Yian Wang
Hari Sundaram
84
0
0
19 Sep 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
1.1K
0
0
30 Dec 2024
Illustrious: an Open Advanced Illustration Model
Sang Hyun Park
Jun Young Koh
Junha Lee
Joy Song
Dongha Kim
Hoyeon Moon
Hyunju Lee
Min Song
VLM
169
2
0
30 Sep 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
288
3
0
20 Jun 2024
Optimal Batch Allocation for Wireless Federated Learning
IEEE Internet of Things Journal (IEEE IoT J.), 2024
Jaeyoung Song
Sang-Woon Jeon
167
1
0
03 Apr 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
354
8
0
17 Feb 2024
Training DNN Models over Heterogeneous Clusters with Optimal Performance
Chengyi Nie
Jessica Maghakian
Zhenhua Liu
126
0
0
07 Feb 2024
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
169
2
0
24 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
200
43
0
12 Sep 2023
Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation
IEEE Internet of Things Journal (IEEE IoT J.), 2023
Mai Le
D. Hoang
Diep N. Nguyen
Won Joo Hwang
Quoc-Viet Pham
87
10
0
09 Aug 2023
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), 2020
S. Tyagi
Prateek Sharma
188
25
0
20 May 2023
AccelAT: A Framework for Accelerating the Adversarial Training of Deep Neural Networks through Accuracy Gradient
IEEE Access (IEEE Access), 2022
F. Nikfam
Alberto Marchisio
Maurizio Martina
Mohamed Bennai
AAML
153
1
0
13 Oct 2022
Resource-aware Deep Learning for Wireless Fingerprinting Localization
Gregor Cerar
Blaž Bertalanič
Carolina Fortuna
HAI
144
2
0
12 Oct 2022
Dynamic Batch Adaptation
Cristian Simionescu
George Stoica
Robert Herscovici
ODL
120
1
0
01 Aug 2022
Modeling the Machine Learning Multiverse
Neural Information Processing Systems (NeurIPS), 2022
Samuel J. Bell
Onno P. Kampman
Jesse Dodge
Neil D. Lawrence
166
21
0
13 Jun 2022
Hyper-Learning for Gradient-Based Batch Size Adaptation
Calum MacLellan
Feng Dong
101
0
0
17 May 2022
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2022
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
226
10
0
28 Apr 2022
Towards Sustainable Deep Learning for Wireless Fingerprinting Localization
Anže Pirnat
Blaž Bertalanič
Gregor Cerar
M. Mohorčič
Marko Meza
Carolina Fortuna
HAI
90
11
0
22 Jan 2022
Simple and Effective Balance of Contrastive Losses
Arnaud Sors
Rafael Sampaio de Rezende
Sarah Ibrahimi
J. Andreoli
SSL
171
0
0
22 Dec 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
257
18
0
01 Nov 2021
Concurrent Adversarial Learning for Large-Batch Training
International Conference on Learning Representations (ICLR), 2021
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
186
13
0
01 Jun 2021
Fast Jacobian-Vector Product for Deep Networks
Randall Balestriero
Richard Baraniuk
148
7
0
01 Apr 2021
On the Utility of Gradient Compression in Distributed Training Systems
Conference on Machine Learning and Systems (MLSys), 2021
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
245
53
0
28 Feb 2021
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
91
10
0
16 Dec 2020
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
167
26
0
29 Oct 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
788
88
0
17 Sep 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
193
219
0
27 Aug 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
International Conference on Machine Learning (ICML), 2020
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
138
40
0
09 Jul 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
683
186
0
03 Jul 2020
Effective Elastic Scaling of Deep Learning Workloads
IEEE/ACM International Symposium on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems (MASCOTS), 2020
Vaibhav Saxena
K.R. Jayaram
Saurav Basu
Yogish Sabharwal
Ashish Verma
126
9
0
24 Jun 2020
Multi-Precision Policy Enforced Training (MuPPET): A precision-switching strategy for quantised fixed-point training of CNNs
A. Rajagopal
D. A. Vink
Stylianos I. Venieris
C. Bouganis
MQ
150
15
0
16 Jun 2020
HierTrain: Fast Hierarchical Edge AI Learning with Hybrid Parallelism in Mobile-Edge-Cloud Computing
IEEE Open Journal of the Communications Society (OJ-COMSOC), 2020
Deyin Liu
Xu Chen
Zhi Zhou
Qing Ling
157
54
0
22 Mar 2020
GeoDA: a geometric framework for black-box adversarial attacks
Computer Vision and Pattern Recognition (CVPR), 2020
A. Rahmati
Seyed-Mohsen Moosavi-Dezfooli
P. Frossard
H. Dai
MLAU
AAML
209
129
0
13 Mar 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
International Conference on Learning Representations (ICLR), 2020
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
216
72
0
07 Jan 2020
Improving the convergence of SGD through adaptive batch sizes
Scott Sievert
Zachary B. Charles
ODL
161
9
0
18 Oct 2019
FasTrCaps: An Integrated Framework for Fast yet Accurate Training of Capsule Networks
Alberto Marchisio
Beatrice Bussolino
Alessio Colucci
Muhammad Abdullah Hanif
Maurizio Martina
Guido Masera
Mohamed Bennai
169
7
0
24 May 2019
On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization
International Conference on Machine Learning (ICML), 2019
Hao Yu
Rong Jin
168
52
0
10 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
138
23
0
26 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
772
1,100
0
01 Apr 2019
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
155
24
0
14 Mar 2019
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
192
95
0
24 Jan 2019
Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases
Yuriy Kochura
Yuri G. Gordienko
Vlad Taran
N. Gordienko
Alexandr Rokovyi
Oleg Alienin
S. Stirenko
AI4CE
84
33
0
31 Dec 2018
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
718
349
0
14 Dec 2018
Parameter Re-Initialization through Cyclical Batch Size Schedules
Norman Mu
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
156
8
0
04 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
175
76
0
30 Nov 2018
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
279
96
0
29 Nov 2018
On Periodic Functions as Regularizers for Quantization of Neural Networks
Maxim Naumov
Utku Diril
Jongsoo Park
Benjamin Ray
Jedrzej Jablonski
Andrew Tulloch
MQ
93
25
0
24 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
171
79
0
13 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Journal of machine learning research (JMLR), 2018
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
473
449
0
08 Nov 2018
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Alfons Kemper
Kurt Keutzer
Michael W. Mahoney
ODL
221
46
0
02 Oct 2018
1
2
Next