Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

24 April 2022

Chao Ma

Papers citing "Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes"

25 / 25 papers shown

Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 20 0 0 05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 47 4 0 17 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Zhanpeng Zhou Mingze Wang Yuchen Mao Bingrui Li Junchi Yan AAML 57 0 0 14 Oct 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization Yuhang Cai Jingfeng Wu Song Mei Michael Lindsey Peter L. Bartlett 32 2 0 12 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment Mark Lowell Catharine A. Kastner 20 0 0 31 May 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization Mingze Wang Haotian He Jinbo Wang Zilin Wang Guanhua Huang Feiyu Xiong Zhiyu Li E. Weinan Lei Wu 37 6 0 31 May 2024
Corridor Geometry in Gradient-Based Optimization Benoit Dherin M. Rosca 25 0 0 13 Feb 2024
Data-induced multiscale losses and efficient multirate gradient descent schemes Juncai He Liangchen Liu Yen-Hsi Tsai 20 0 0 05 Feb 2024
Understanding the Generalization Benefits of Late Learning Rate Decay Yinuo Ren Chao Ma Lexing Ying AI4CE 19 6 0 21 Jan 2024
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling Mingze Wang Zeping Min Lei Wu 25 3 0 24 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization Elan Rosenfeld Andrej Risteski 25 10 0 07 Nov 2023
A Quadratic Synchronization Rule for Distributed Deep Learning Xinran Gu Kaifeng Lyu Sanjeev Arora Jingzhao Zhang Longbo Huang 41 1 0 22 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent Mingze Wang Lei Wu 17 3 0 01 Oct 2023
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method Mohannad Elhamod Anuj Karpatne 10 1 0 26 Sep 2023
Sharpness-Aware Minimization and the Edge of Stability Philip M. Long Peter L. Bartlett AAML 25 9 0 21 Sep 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent Tongtian Zhu Fengxiang He Kaixuan Chen Mingli Song Dacheng Tao 34 15 0 05 Jun 2023
Loss Spike in Training Neural Networks Zhongwang Zhang Z. Xu 28 4 0 20 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability Jingfeng Wu Vladimir Braverman Jason D. Lee 24 16 0 19 May 2023
Training trajectories, mini-batch losses and the curious role of the learning rate Mark Sandler A. Zhmoginov Max Vladymyrov Nolan Miller ODL 11 10 0 05 Jan 2023
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 31 36 0 14 Dec 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 30 72 0 26 Aug 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 35 69 0 14 Jun 2022
A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks Yuxin Sun Dong Lao G. Sundaramoorthi A. Yezzi 19 3 0 04 Jun 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 83 98 0 13 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 55 40 0 07 Oct 2021