On Learning Rates and Schrödinger Operators

On Learning Rates and Schrödinger Operators

15 April 2020

Weijie J. Su

Michael I. Jordan

Papers citing "On Learning Rates and Schrödinger Operators"

16 / 16 papers shown

Title
FOCUS: First Order Concentrated Updating Scheme Yizhou Liu Ziming Liu Jeff Gore ODL 108 1 0 21 Jan 2025
A General Continuous-Time Formulation of Stochastic ADMM and Its Variants Chris Junchi Li 37 0 0 22 Apr 2024
Quantum Langevin Dynamics for Optimization Zherui Chen Yuchen Lu Hao Wang Yizhou Liu Tongyang Li AI4CE 21 10 0 27 Nov 2023
On Underdamped Nesterov's Acceleration Shu Chen Bin Shi Ya-xiang Yuan 27 5 0 28 Apr 2023
Learning Rate Schedules in the Presence of Distribution Shift Matthew Fahrbach Adel Javanmard Vahab Mirrokni Pratik Worah 24 6 0 27 Mar 2023
Global Convergence of SGD On Two Layer Neural Nets Pulkit Gopalani Anirbit Mukherjee 26 5 0 20 Oct 2022
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks Yizhou Liu Weijie J. Su Tongyang Li 36 18 0 29 Sep 2022
Gradient Norm Minimization of Nesterov Acceleration: $o(1/k^3)$ Shu Chen Bin Shi Ya-xiang Yuan 33 15 0 19 Sep 2022
On Uniform Boundedness Properties of SGD and its Momentum Variants Xiaoyu Wang M. Johansson 23 3 0 25 Jan 2022
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations Jiayao Zhang Hua Wang Weijie J. Su 35 7 0 11 Oct 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 44 78 0 24 Feb 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training Cong Fang Hangfeng He Qi Long Weijie J. Su FAtt 130 168 0 29 Jan 2021
Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge Chaoyang He M. Annavaram A. Avestimehr FedML 21 23 0 28 Jul 2020
On stochastic mirror descent with interacting particles: convergence properties and variance reduction Anastasia Borovykh N. Kantas P. Parpas G. Pavliotis 28 12 0 15 Jul 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,892 0 15 Sep 2016
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights Weijie Su Stephen P. Boyd Emmanuel J. Candes 108 1,157 0 04 Mar 2015