WNGrad: Learn the Learning Rate in Gradient Descent

7 March 2018

Papers citing "WNGrad: Learn the Learning Rate in Gradient Descent"

15 / 15 papers shown

Title
Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks Vivak Patel Christian Varner 28 0 0 20 Sep 2024
A Novel Gradient Methodology with Economical Objective Function Evaluations for Data Science Applications Christian Varner Vivak Patel 23 2 0 19 Sep 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks Yuan Cao Difan Zou Yuan-Fang Li Quanquan Gu MLT 31 5 0 20 Jun 2023
On the Weight Dynamics of Deep Normalized Networks Christian H. X. Ali Mehmeti-Göpel Michael Wand 32 1 0 01 Jun 2023
Robust Implicit Regularization via Weight Normalization H. Chou Holger Rauhut Rachel A. Ward 28 7 0 09 May 2023
Adaptive Gradient Methods with Local Guarantees Zhou Lu Wenhan Xia Sanjeev Arora Elad Hazan ODL 22 9 0 02 Mar 2022
A Stochastic Bundle Method for Interpolating Networks Alasdair Paren Leonard Berrada Rudra P. K. Poudel M. P. Kumar 24 4 0 29 Jan 2022
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity A. Davtyan Sepehr Sameni L. Cerkezi Givi Meishvili Adam Bielski Paolo Favaro ODL 53 2 0 07 Jul 2021
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective Kushal Chakrabarti Nikhil Chopra ODL AI4CE 31 9 0 31 May 2021
Flexible numerical optimization with ensmallen Ryan R. Curtin Marcus Edel Rahul Prabhu S. Basak Zhihao Lou Conrad Sanderson 18 1 0 09 Mar 2020
LOSSGRAD: automatic learning rate in gradient descent B. Wójcik Lukasz Maziarka Jacek Tabor ODL 32 4 0 20 Feb 2019
Theoretical Analysis of Auto Rate-Tuning by Batch Normalization Sanjeev Arora Zhiyuan Li Kaifeng Lyu 28 130 0 10 Dec 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes Rachel A. Ward Xiaoxia Wu Léon Bottou ODL 19 358 0 05 Jun 2018
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes Xiaoyun Li Francesco Orabona 34 290 0 21 May 2018
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes Ohad Shamir Tong Zhang 101 570 0 08 Dec 2012