Disentangling Adaptive Gradient Methods from Learning Rates

Disentangling Adaptive Gradient Methods from Learning Rates

26 February 2020

Papers citing "Disentangling Adaptive Gradient Methods from Learning Rates"

15 / 15 papers shown

Title
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation Yaxiong Chen Yujie Wang Zixuan Zheng Jingliang Hu Yilei Shi Shengwu Xiong Xiao Xiang Zhu Lichao Mou 54 0 0 18 Mar 2025
Deconstructing What Makes a Good Optimizer for Language Models Rosie Zhao Depen Morwani David Brandfonbrener Nikhil Vyas Sham Kakade 50 17 0 10 Jul 2024
4-bit Shampoo for Memory-Efficient Network Training Sike Wang Jia Li Pan Zhou Hua Huang MQ 41 5 0 28 May 2024
Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods A. Ma Yangchen Pan Amir-massoud Farahmand AAML 25 5 0 13 Aug 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions Vladimir Feinberg Xinyi Chen Y. Jennifer Sun Rohan Anil Elad Hazan 29 12 0 07 Feb 2023
Disentangling the Mechanisms Behind Implicit Regularization in SGD Zachary Novack Simran Kaur Tanya Marwah Saurabh Garg Zachary Chase Lipton FedML 27 2 0 29 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up Luke Metz James Harrison C. Freeman Amil Merchant Lucas Beyer ... Naman Agrawal Ben Poole Igor Mordatch Adam Roberts Jascha Narain Sohl-Dickstein 35 60 0 17 Nov 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models Rohan Anil S. Gadanho Danya Huang Nijith Jacob Zhuoshu Li ... Cristina Pop Kevin Regan G. Shamir Rakesh Shivanna Qiqi Yan 3DV 26 41 0 12 Sep 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit Boaz Barak Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 39 123 0 18 Jul 2022
Hamiltonian Monte Carlo Particle Swarm Optimizer Omatharv Bharat Vaidya Rithvik Terence DSouza Snehanshu Saha S. Dhavala Swagatam Das 16 0 0 08 May 2022
Adaptive Gradient Methods with Local Guarantees Zhou Lu Wenhan Xia Sanjeev Arora Elad Hazan ODL 27 9 0 02 Mar 2022
Understanding AdamW through Proximal Methods and Scale-Freeness Zhenxun Zhuang Mingrui Liu Ashok Cutkosky Francesco Orabona 39 63 0 31 Jan 2022
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes James Lucas Juhan Bae Michael Ruogu Zhang Stanislav Fort R. Zemel Roger C. Grosse MoMe 164 28 0 22 Apr 2021
How to decay your learning rate Aitor Lewkowycz 41 24 0 23 Mar 2021
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei J. Lee Tengyu Ma 29 93 0 15 Jun 2020