Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.05446
Cited By
On Empirical Comparisons of Optimizers for Deep Learning
11 October 2019
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Empirical Comparisons of Optimizers for Deep Learning"
50 / 105 papers shown
Title
OODTE: A Differential Testing Engine for the ONNX Optimizer
Nikolaos Louloudakis
Ajitha Rajan
46
0
0
03 May 2025
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Rui Pan
Shiqian Ma
D. Goldfarb
Tong Zhang
ODL
99
2
0
26 Mar 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
53
0
0
10 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
226
0
0
22 Jan 2025
Brain-to-Text Benchmark '24: Lessons Learned
Francis R. Willett
Jingyuan Li
Trung Le
Chaofei Fan
Mingfei Chen
...
Maxwell Kounga
E. Kelly Buchanan
D. Zoltowski
Scott W. Linderman
Jaimie M. Henderson
32
0
0
23 Dec 2024
A Mirror Descent Perspective of Smoothed Sign Descent
Shuyang Wang
Diego Klabjan
43
0
0
18 Oct 2024
The Epochal Sawtooth Effect: Unveiling Training Loss Oscillations in Adam and Other Optimizers
Qi Liu
Wanjing Ma
21
0
0
14 Oct 2024
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes
Nikita Kiselev
Andrey Grabovoy
54
1
0
18 Sep 2024
An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing
Xinlang Yue
Yiran Liu
Fangzhou Shi
Sihong Luo
Chen Zhong
Min Lu
Zhe Xu
26
0
0
20 Aug 2024
Narrowing the Focus: Learned Optimizers for Pretrained Models
Gus Kristiansen
Mark Sandler
A. Zhmoginov
Nolan Miller
Anirudh Goyal
Jihwan Lee
Max Vladymyrov
39
1
0
17 Aug 2024
Uncertainty-Informed Volume Visualization using Implicit Neural Representation
Shanu Saklani
Chitwan Goel
Shrey Bansal
Zhe Wang
Soumya Dutta
Tushar M. Athawale
D. Pugmire
Christopher R. Johnson
50
0
0
12 Aug 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Rui Pan
Tong Zhang
26
6
0
21 Jun 2024
RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis
Md. Mostafizer Rahman
Ariful Islam Shiplu
Yutaka Watanobe
Md. Ashad Alam
38
10
0
01 Jun 2024
Visual Analysis of Prediction Uncertainty in Neural Networks for Deep Image Synthesis
Soumya Dutta
Faheem Nizar
Ahmad Amaan
Ayan Acharya
AAML
48
1
0
22 May 2024
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
32
0
0
05 Apr 2024
Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects
Han Wang
Sijia Yu
Chunyang Chen
Burak Turhan
Xiaodong Zhu
ELM
MLAU
28
2
0
26 Feb 2024
Predictive Churn with the Set of Good Models
J. Watson-Daniels
Flavio du Pin Calmon
Alexander DÁmour
Carol Xuan Long
David C. Parkes
Berk Ustun
88
7
0
12 Feb 2024
Should I try multiple optimizers when fine-tuning pre-trained Transformers for NLP tasks? Should I tune their hyperparameters?
Nefeli Gkouti
Prodromos Malakasiotis
Stavros Toumpis
Ion Androutsopoulos
40
5
0
10 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard Turner
Alireza Makhzani
ODL
57
12
0
05 Feb 2024
Breaking MLPerf Training: A Case Study on Optimizing BERT
Yongdeok Kim
Jaehyung Ahn
Myeongwoo Kim
Changin Choi
Heejae Kim
...
Xiongzhan Linghu
Jingkun Ma
Lin Chen
Yuehua Dai
Sungjoo Yoo
30
0
0
04 Feb 2024
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Chengli Tan
Jiangshe Zhang
Junmin Liu
Yicheng Wang
Yunda Hao
AAML
39
1
0
14 Jan 2024
Balancing Act: Constraining Disparate Impact in Sparse Models
Meraj Hashemizadeh
Juan Ramirez
Rohan Sukumaran
G. Farnadi
Simon Lacoste-Julien
Jose Gallego-Posada
33
5
0
31 Oct 2023
Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO's 4000 TPU Months
Fady Rezk
Antreas Antoniou
Henry Gouk
Timothy M. Hospedales
ELM
32
1
0
27 Oct 2023
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
45
9
0
20 Sep 2023
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Pranshu Malviya
Gonçalo Mordido
A. Baratin
Reza Babanezhad Harikandeh
Jerry Huang
Simon Lacoste-Julien
Razvan Pascanu
Sarath Chandar
ODL
36
1
0
18 Jul 2023
Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning
Achraf Bahamou
D. Goldfarb
ODL
36
0
0
23 May 2023
An Empirical Comparison of Optimizers for Quantum Machine Learning with SPSA-based Gradients
Marco Wiedmann
Marc Hölle
Maniraman Periyasamy
Nico Meyer
Christian Ufrecht
Daniel D. Scherer
Axel Plinge
Christopher Mutschler
80
18
0
27 Apr 2023
Bayesian Optimization Meets Self-Distillation
HyunJae Lee
Heon Song
Hyeonsoo Lee
Gi-hyeon Lee
Suyeong Park
Donggeun Yoo
UQCV
BDL
41
1
0
25 Apr 2023
Green Federated Learning
Ashkan Yousefpour
Sheng Guo
Ashish Shenoy
Sayan Ghosh
Pierre Stock
Kiwan Maeng
Schalk-Willem Kruger
Michael G. Rabbat
Carole-Jean Wu
Ilya Mironov
FedML
AI4CE
48
10
0
26 Mar 2023
Improving physics-informed neural networks with meta-learned optimization
Alexander Bihlo
PINN
36
18
0
13 Mar 2023
Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks
D. Pasechnyuk
Anton Prazdnichnykh
Mikhail Evtikhiev
T. Bryksin
34
1
0
06 Mar 2023
AI Security for Geoscience and Remote Sensing: Challenges and Future Trends
Yonghao Xu
Tao Bai
Weikang Yu
Shizhen Chang
P. M. Atkinson
Pedram Ghamisi
AAML
40
47
0
19 Dec 2022
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms
Danimir T. Doncevic
Alexander Mitsos
Yu Guo
Qianxiao Li
Felix Dietrich
Manuel Dahmen
Ioannis G. Kevrekidis
21
7
0
22 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
37
60
0
17 Nov 2022
Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Hiroki Naganuma
Kartik Ahuja
S. Takagi
Tetsuya Motokawa
Rio Yokota
Kohta Ishikawa
I. Sato
Ioannis Mitliagkas
OOD
13
7
0
15 Nov 2022
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
62
5
0
28 Oct 2022
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
A. Mensch
Yuanzhi Li
Gauthier Gidel
24
4
0
09 Oct 2022
Improving Multi-fidelity Optimization with a Recurring Learning Rate for Hyperparameter Tuning
HyunJae Lee
Gihyeon Lee
Junh-Nam Kim
Sungjun Cho
Dohyun Kim
Donggeun Yoo
41
3
0
26 Sep 2022
Visualizing high-dimensional loss landscapes with Hessian directions
Lucas Böttcher
Gregory R. Wheeler
37
13
0
28 Aug 2022
Automatic Synthesis of Neurons for Recurrent Neural Nets
R. Olsson
C. Tran
L. Magnusson
15
2
0
29 Jun 2022
Dissecting U-net for Seismic Application: An In-Depth Study on Deep Learning Multiple Removal
Ricard Durall
A. Ghanim
N. Ettrich
J. Keuper
12
2
0
24 Jun 2022
Near-optimal control of dynamical systems with neural ordinary differential equations
Lucas Böttcher
Thomas Asikis
AI4CE
14
19
0
22 Jun 2022
Modeling the Machine Learning Multiverse
Samuel J. Bell
Onno P. Kampman
Jesse Dodge
Neil D. Lawrence
28
17
0
13 Jun 2022
On Distributed Adaptive Optimization with Gradient Compression
Xiaoyun Li
Belhal Karimi
Ping Li
23
25
0
11 May 2022
The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning
Jessica Hullman
Sayash Kapoor
Priyanka Nanayakkara
Andrew Gelman
Arvind Narayanan
33
39
0
12 Mar 2022
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
27
9
0
02 Mar 2022
A Mini-Block Fisher Method for Deep Neural Networks
Achraf Bahamou
D. Goldfarb
Yi Ren
ODL
39
9
0
08 Feb 2022
Nearest neighbor search with compact codes: A decoder perspective
Kenza Amara
Matthijs Douze
Alexandre Sablayrolles
Hervé Jégou
MQ
6
6
0
17 Dec 2021
Predicting the utility of search spaces for black-box optimization: a simple, budget-aware approach
Setareh Ariafar
Justin Gilmer
Zachary Nado
Jasper Snoek
Rodolphe Jenatton
George E. Dahl
46
1
0
15 Dec 2021
1
2
3
Next