Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00489
Cited By
Don't Decay the Learning Rate, Increase the Batch Size
1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Don't Decay the Learning Rate, Increase the Batch Size"
50 / 179 papers shown
Title
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
36
9
0
25 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
149
0
07 Mar 2022
ES-dRNN with Dynamic Attention for Short-Term Load Forecasting
Slawek Smyl
Grzegorz Dudek
Paweł Pełka
AI4TS
21
11
0
02 Mar 2022
Cyclical Focal Loss
L. Smith
35
14
0
16 Feb 2022
A Group-Equivariant Autoencoder for Identifying Spontaneously Broken Symmetries
Devanshu Agrawal
A. Del Maestro
Steven Johnston
James Ostrowski
DRL
AI4CE
36
2
0
13 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
23
7
0
09 Feb 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
Matilde Gargiani
Andrea Zanelli
Andrea Martinelli
Tyler H. Summers
John Lygeros
33
14
0
01 Feb 2022
Computationally Efficient Approximations for Matrix-based Renyi's Entropy
Tieliang Gong
Yuxin Dong
Shujian Yu
B. Dong
67
2
0
27 Dec 2021
Automated Deep Learning: Neural Architecture Search Is Not the End
Xuanyi Dong
D. Kedziora
Katarzyna Musial
Bogdan Gabrys
29
26
0
16 Dec 2021
Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization
Hideaki Iiduka
13
0
0
14 Dec 2021
Hybrid BYOL-ViT: Efficient approach to deal with small datasets
Safwen Naimi
Rien van Leeuwen
W. Souidène
S. B. Saoud
SSL
ViT
25
2
0
08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
34
4
0
07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
30
14
0
01 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
35
4
0
29 Oct 2021
A Sequence to Sequence Model for Extracting Multiple Product Name Entities from Dialog
Praneeth Gubbala
Xuan Zhang
16
1
0
28 Oct 2021
NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters
Yoichi Hirose
Nozomu Yoshinari
Shinichi Shirakawa
25
13
0
19 Oct 2021
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers
Yujing Ma
Florin Rusu
Kesheng Wu
A. Sim
46
3
0
13 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
35
8
0
11 Oct 2021
Batch size-invariance for policy optimization
Jacob Hilton
K. Cobbe
John Schulman
17
11
0
01 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
Gabriel Bénédict
Vincent Koops
Daan Odijk
Maarten de Rijke
37
30
0
24 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
54
47
0
08 Aug 2021
Large-Scale Differentially Private BERT
Rohan Anil
Badih Ghazi
Vineet Gupta
Ravi Kumar
Pasin Manurangsi
36
132
0
03 Aug 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian Foster
34
3
0
22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
25
75
0
22 Jun 2021
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
47
156
0
17 Jun 2021
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
33
289
0
11 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
29
8
0
04 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
36
13
0
01 Jun 2021
Deep Neural Network as an alternative to Boosted Decision Trees for PID
Denis Stanev
Riccardo Riva
Michele Umassi
PINN
22
1
0
28 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
27
29
0
31 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
38
46
0
28 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Guojun Xiong
Gang Yan
Rahul Singh
Jian Li
33
12
0
11 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
40
23
0
09 Feb 2021
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
21
1
0
16 Dec 2020
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training
Federico Zocco
Seán F. McLoone
ODL
26
4
0
14 Dec 2020
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
Erik Wijmans
Irfan Essa
Dhruv Batra
3DPC
30
10
0
11 Dec 2020
Towards constraining warm dark matter with stellar streams through neural simulation-based inference
Joeri Hermans
N. Banik
Christoph Weniger
G. Bertone
Gilles Louppe
30
29
0
30 Nov 2020
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet
Lorenzo Valerio
F. M. Nardini
A. Passarella
R. Perego
25
12
0
17 Nov 2020
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
22
21
0
04 Nov 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
41
207
0
14 Oct 2020
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
24
3
0
28 Sep 2020
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
21
75
0
19 Aug 2020
A Survey on Large-scale Machine Learning
Meng Wang
Weijie Fu
Xiangnan He
Shijie Hao
Xindong Wu
25
109
0
10 Aug 2020
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
29
3
0
24 Jul 2020
On stochastic mirror descent with interacting particles: convergence properties and variance reduction
Anastasia Borovykh
N. Kantas
P. Parpas
G. Pavliotis
28
12
0
15 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
30
37
0
09 Jul 2020
Coded Distributed Computing with Partial Recovery
Emre Ozfatura
S. Ulukus
Deniz Gunduz
38
28
0
04 Jul 2020
Previous
1
2
3
4
Next