Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis

4 June 2021

Papers citing "Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis"

13 / 13 papers shown

Title
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Zhanpeng Zhou Mingze Wang Yuchen Mao Bingrui Li Junchi Yan AAML 62 0 0 14 Oct 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent Benjamin Gess Sebastian Kassing Nimit Rana 37 0 0 02 Feb 2024
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances Marcel Kühn B. Rosenow 11 3 0 08 Jun 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 13 7 0 19 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent Benjamin Gess Sebastian Kassing Vitalii Konarovskyi DiffM 26 6 0 14 Feb 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis Taiki Miyagawa 35 9 0 28 Oct 2022
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent Satyen Kale Jason D. Lee Chris De Sa Ayush Sekhari Karthik Sridharan 24 4 0 13 Oct 2022
Identical Image Retrieval using Deep Learning Sayan Nath Nikhil Nayak VLM 32 1 0 10 May 2022
A Continuous-time Stochastic Gradient Descent Method for Continuous Data Kexin Jin J. Latz Chenguang Liu Carola-Bibiane Schönlieb 18 9 0 07 Dec 2021
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 88 98 0 13 Oct 2021
Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis Stephan Wojtowytsch 23 50 0 04 May 2021
Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes Steffen Dereich Sebastian Kassing 28 27 0 16 Feb 2021
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,888 0 15 Sep 2016