Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.00719
Cited By
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
1 June 2020
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning"
50 / 150 papers shown
Title
How important are activation functions in regression and classification? A survey, performance comparison, and future directions
Ameya Dilip Jagtap
George Karniadakis
AI4CE
37
71
0
06 Sep 2022
DRAGON: Decentralized Fault Tolerance in Edge Federations
Shreshth Tuli
G. Casale
N. Jennings
23
10
0
16 Aug 2022
A Practical Second-order Latent Factor Model via Distributed Particle Swarm Optimization
Jialiang Wang
Yurong Zhong
Weiling Li
21
0
0
12 Aug 2022
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace
Yucong Liu
Shixing Yu
Tong Lin
27
1
0
11 Aug 2022
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Severin Reiz
T. Neckel
H. Bungartz
ODL
31
1
0
03 Aug 2022
Adaptive Second Order Coresets for Data-efficient Machine Learning
Omead Brandon Pooladzandi
David Davini
Baharan Mirzasoleiman
22
62
0
28 Jul 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
S. Shi
Wei Wang
Bo-wen Li
36
10
0
30 Jun 2022
LogGENE: A smooth alternative to check loss for Deep Healthcare Inference Tasks
Aryaman Jeendgar
Aditya Pola
S. Dhavala
Snehanshu Saha
UQCV
11
1
0
19 Jun 2022
On Scaled Methods for Saddle Point Problems
Aleksandr Beznosikov
Aibek Alanov
D. Kovalev
Martin Takáč
Alexander Gasnikov
30
4
0
16 Jun 2022
Stochastic Gradient Methods with Preconditioned Updates
Abdurakhmon Sadiev
Aleksandr Beznosikov
Abdulla Jasem Almansoori
Dmitry Kamzolov
R. Tappenden
Martin Takáč
ODL
34
9
0
01 Jun 2022
FlexiBERT: Are Current Transformer Architectures too Homogeneous and Rigid?
Shikhar Tuli
Bhishma Dedhia
Shreshth Tuli
N. Jha
32
14
0
23 May 2022
HessianFR: An Efficient Hessian-based Follow-the-Ridge Algorithm for Minimax Optimization
Yihang Gao
Huafeng Liu
Michael K. Ng
Mingjie Zhou
25
2
0
23 May 2022
A Dynamic Weighted Tabular Method for Convolutional Neural Networks
Md Ifraham Iqbal
Md. Saddam Hossain Mukta
Ahmed Rafi Hasan
LMTD
21
12
0
20 May 2022
Second-Order Sensitivity Analysis for Bilevel Optimization
Robert Dyro
Edward Schmerling
Nikos Arechiga
Marco Pavone
21
3
0
04 May 2022
A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization
Jarad Forristal
J. Griffin
Wenwen Zhou
S. Yektamaram
ODL
12
0
0
19 Apr 2022
The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining
Yi Liu
Lei Xu
Xingliang Yuan
Cong Wang
Bo Li
MU
22
142
0
14 Mar 2022
Efficient Natural Gradient Descent Methods for Large-Scale PDE-Based Optimization Problems
L. Nurbekyan
Wanzhou Lei
Yunbo Yang
15
12
0
13 Feb 2022
A Mini-Block Fisher Method for Deep Neural Networks
Achraf Bahamou
D. Goldfarb
Yi Ren
ODL
34
9
0
08 Feb 2022
Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size
Adityanarayanan Radhakrishnan
M. Belkin
Caroline Uhler
ODL
21
0
0
30 Dec 2021
GOSH: Task Scheduling Using Deep Surrogate Models in Fog Computing Environments
Shreshth Tuli
G. Casale
N. Jennings
32
21
0
16 Dec 2021
Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time
Zhao Song
Licheng Zhang
Ruizhe Zhang
32
64
0
14 Dec 2021
PredProp: Bidirectional Stochastic Optimization with Precision Weighted Predictive Coding
André Ofner
Sebastian Stober
17
2
0
16 Nov 2021
Predictive coding, precision and natural gradients
André Ofner
Raihan Kabir Ratul
Suhita Ghosh
Sebastian Stober
27
3
0
12 Nov 2021
Applications and Techniques for Fast Machine Learning in Science
A. Deiana
Nhan Tran
Joshua C. Agar
Michaela Blott
G. D. Guglielmo
...
Ashish Sharma
S. Summers
Pietro Vischia
J. Vlimant
Olivia Weng
14
71
0
25 Oct 2021
Nys-Newton: Nyström-Approximated Curvature for Stochastic Optimization
Dinesh Singh
Hardik Tankaria
M. Yamada
ODL
42
2
0
16 Oct 2021
LightSeq2: Accelerated Training for Transformer-based Models on GPUs
Xiaohui Wang
Yang Wei
Ying Xiong
Guyue Huang
Xian Qian
Yufei Ding
Mingxuan Wang
Lei Li
VLM
13
30
0
12 Oct 2021
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Juntang Zhuang
Yifan Ding
Tommy M. Tang
Nicha Dvornek
S. Tatikonda
James S. Duncan
ODL
24
4
0
11 Oct 2021
Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
Fu Wei
Chenglong Bao
Yang Liu
30
19
0
04 Oct 2021
Scale-invariant Learning by Physics Inversion
Philipp Holl
V. Koltun
Nils Thuerey
PINN
AI4CE
21
8
0
30 Sep 2021
AdaInject: Injection Based Adaptive Gradient Descent Optimizers for Convolutional Neural Networks
S. Dubey
S. H. Shabbeer Basha
S. Singh
B. B. Chaudhuri
ODL
48
9
0
26 Sep 2021
Inequality Constrained Stochastic Nonlinear Optimization via Active-Set Sequential Quadratic Programming
Sen Na
M. Anitescu
Mladen Kolar
34
33
0
23 Sep 2021
Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information
Majid Jahani
S. Rusakov
Zheng Shi
Peter Richtárik
Michael W. Mahoney
Martin Takávc
ODL
24
25
0
11 Sep 2021
Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
Yun Yue
Yongchao Liu
Suo Tong
Minghao Li
Zhen Zhang
Chunyang Wen
Huanjun Bao
Lihong Gu
Jinjie Gu
Yixiang Mu
ODL
AI4CE
25
2
0
30 Jul 2021
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
Elias Frantar
Eldar Kurtic
Dan Alistarh
13
57
0
07 Jul 2021
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
A. Davtyan
Sepehr Sameni
L. Cerkezi
Givi Meishvili
Adam Bielski
Paolo Favaro
ODL
53
2
0
07 Jul 2021
LocalNewton: Reducing Communication Bottleneck for Distributed Learning
Vipul Gupta
Avishek Ghosh
Michal Derezinski
Rajiv Khanna
Kannan Ramchandran
Michael W. Mahoney
38
12
0
16 May 2021
Better SGD using Second-order Momentum
Hoang Tran
Ashok Cutkosky
ODL
18
12
0
04 Mar 2021
Hessian Eigenspectra of More Realistic Nonlinear Models
Zhenyu Liao
Michael W. Mahoney
25
29
0
02 Mar 2021
Quasi-Newton's method in the class gradient defined high-curvature subspace
Mark Tuddenham
Adam Prugel-Bennett
Jonathan Hare
ODL
25
7
0
28 Nov 2020
A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Kai-Xin Gao
Xiaolei Liu
Zheng-Hai Huang
Min Wang
Zidong Wang
Dachuan Xu
F. Yu
24
11
0
21 Nov 2020
Sparse sketches with small inversion bias
Michal Derezinski
Zhenyu Liao
Yan Sun
Michael W. Mahoney
23
21
0
21 Nov 2020
BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory
Amirali Aghazadeh
Vipul Gupta
Alex DeWeese
O. O. Koyluoglu
Kannan Ramchandran
8
2
0
26 Oct 2020
Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Samy Jelassi
Aaron Defazio
33
4
0
20 Oct 2020
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
Xuezhe Ma
ODL
20
31
0
28 Sep 2020
Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection
Elizabeth Newman
Lars Ruthotto
Joseph L. Hart
B. V. B. Waanders
AAML
33
19
0
26 Jul 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
40
162
0
03 Jul 2020
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization
Thomas O'Leary-Roseberry
Nick Alger
Omar Ghattas
ODL
37
9
0
07 Feb 2020
PyHessian: Neural Networks Through the Lens of the Hessian
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
24
289
0
16 Dec 2019
OverSketched Newton: Fast Convex Optimization for Serverless Systems
Vipul Gupta
S. Kadhe
T. Courtade
Michael W. Mahoney
Kannan Ramchandran
19
33
0
21 Mar 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Previous
1
2
3