Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017

Samuel L. Smith

Pieter-Jan Kindermans

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 179 papers shown

Title
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification Juncheng Billy Li Shuhui Qu Po-Yao (Bernie) Huang Florian Metze VLM 36 9 0 25 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer Greg Yang J. E. Hu Igor Babuschkin Szymon Sidor Xiaodong Liu David Farhi Nick Ryder J. Pachocki Weizhu Chen Jianfeng Gao 26 149 0 07 Mar 2022
ES-dRNN with Dynamic Attention for Short-Term Load Forecasting Slawek Smyl Grzegorz Dudek Paweł Pełka AI4TS 21 11 0 02 Mar 2022
Cyclical Focal Loss L. Smith 35 14 0 16 Feb 2022
A Group-Equivariant Autoencoder for Identifying Spontaneously Broken Symmetries Devanshu Agrawal A. Del Maestro Steven Johnston James Ostrowski DRL AI4CE 36 2 0 13 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems Stéphane dÁscoli Maria Refinetti Giulio Biroli 23 7 0 09 Feb 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation Matilde Gargiani Andrea Zanelli Andrea Martinelli Tyler H. Summers John Lygeros 33 14 0 01 Feb 2022
Computationally Efficient Approximations for Matrix-based Renyi's Entropy Tieliang Gong Yuxin Dong Shujian Yu B. Dong 67 2 0 27 Dec 2021
Automated Deep Learning: Neural Architecture Search Is Not the End Xuanyi Dong D. Kedziora Katarzyna Musial Bogdan Gabrys 29 26 0 16 Dec 2021
Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization Hideaki Iiduka 13 0 0 14 Dec 2021
Hybrid BYOL-ViT: Efficient approach to deal with small datasets Safwen Naimi Rien van Leeuwen W. Souidène S. B. Saoud SSL ViT 25 2 0 08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime Hikaru Ibayashi Masaaki Imaizumi 34 4 0 07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 30 14 0 01 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge Abdelrahman I. Hosny Marina Neseem Sherief Reda MQ 35 4 0 29 Oct 2021
A Sequence to Sequence Model for Extracting Multiple Product Name Entities from Dialog Praneeth Gubbala Xuan Zhang 16 1 0 28 Oct 2021
NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters Yoichi Hirose Nozomu Yoshinari Shinichi Shirakawa 25 13 0 19 Oct 2021
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers Yujing Ma Florin Rusu Kesheng Wu A. Sim 46 3 0 13 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations Jiayao Zhang Hua Wang Weijie J. Su 35 8 0 11 Oct 2021
Batch size-invariance for policy optimization Jacob Hilton K. Cobbe John Schulman 17 11 0 01 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks G. Bingham Risto Miikkulainen ODL 24 4 0 18 Sep 2021
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification Gabriel Bénédict Vincent Koops Daan Odijk Maarten de Rijke 37 30 0 24 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters Chen Sun Shenggui Li Jinyue Wang Jun Yu 54 47 0 08 Aug 2021
Large-Scale Differentially Private BERT Rohan Anil Badih Ghazi Vineet Gupta Ravi Kumar Pasin Manurangsi 36 132 0 03 Aug 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes Zhengchun Liu R. Kettimuthu M. Papka Ian Foster 34 3 0 22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of Tooling Donglin Zhuang Xingyao Zhang Shuaiwen Leon Song Sara Hooker 25 75 0 22 Jun 2021
Deep Learning Through the Lens of Example Difficulty R. Baldock Hartmut Maennel Behnam Neyshabur 47 156 0 17 Jun 2021
On Large-Cohort Training for Federated Learning Zachary B. Charles Zachary Garrett Zhouyuan Huo Sergei Shmulyian Virginia Smith FedML 21 113 0 15 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation John Nguyen Kshitiz Malik Hongyuan Zhan Ashkan Yousefpour Michael G. Rabbat Mani Malek Dzmitry Huba FedML 33 289 0 11 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models J. Lamy-Poirier MoE 29 8 0 04 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training Yong Liu Xiangning Chen Minhao Cheng Cho-Jui Hsieh Yang You ODL 36 13 0 01 Jun 2021
Deep Neural Network as an alternative to Boosted Decision Trees for PID Denis Stanev Riccardo Riva Michele Umassi PINN 22 1 0 28 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization Zeke Xie Li-xin Yuan Zhanxing Zhu Masashi Sugiyama 27 29 0 31 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems Saurabh Agarwal Hongyi Wang Shivaram Venkataraman Dimitris Papailiopoulos 38 46 0 28 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 44 78 0 24 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers Guojun Xiong Gang Yan Rahul Singh Jian Li 33 12 0 11 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba Liuyihan Song Pan Pan Kang Zhao Hao Yang Yiming Chen Yingya Zhang Yinghui Xu Rong Jin 40 23 0 09 Feb 2021
Data optimization for large batch distributed training of deep neural networks Shubhankar Gahlot Junqi Yin Mallikarjun Shankar 21 1 0 16 Dec 2020
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training Federico Zocco Seán F. McLoone ODL 26 4 0 14 Dec 2020
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget Erik Wijmans Irfan Essa Dhruv Batra 3DPC 30 10 0 11 Dec 2020
Towards constraining warm dark matter with stellar streams through neural simulation-based inference Joeri Hermans N. Banik Christoph Weniger G. Bertone Gilles Louppe 30 29 0 30 Nov 2020
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet Lorenzo Valerio F. M. Nardini A. Passarella R. Perego 25 12 0 17 Nov 2020
Reverse engineering learned optimizers reveals known and novel mechanisms Niru Maheswaranathan David Sussillo Luke Metz Ruoxi Sun Jascha Narain Sohl-Dickstein 22 21 0 04 Nov 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout Zhao Chen Jiquan Ngiam Yanping Huang Thang Luong Henrik Kretzschmar Yuning Chai Dragomir Anguelov 41 207 0 14 Oct 2020
Improved generalization by noise enhancement Takashi Mori Masahito Ueda 24 3 0 28 Sep 2020
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties Benjamin Kurt Miller Mario Geiger Tess E. Smidt Frank Noé 21 75 0 19 Aug 2020
A Survey on Large-scale Machine Learning Meng Wang Weijie Fu Xiangnan He Shijie Hao Xindong Wu 25 109 0 10 Aug 2020
Linear discriminant initialization for feed-forward neural networks Marissa Masden D. Sinha FedML 29 3 0 24 Jul 2020
On stochastic mirror descent with interacting particles: convergence properties and variance reduction Anastasia Borovykh N. Kantas P. Parpas G. Pavliotis 28 12 0 15 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training Tyler B. Johnson Pulkit Agrawal Haijie Gu Carlos Guestrin ODL 30 37 0 09 Jul 2020
Coded Distributed Computing with Partial Recovery Emre Ozfatura S. Ulukus Deniz Gunduz 38 28 0 04 Jul 2020