Why Do We Need Weight Decay in Modern Deep Learning?

6 October 2023

Papers citing "Why Do We Need Weight Decay in Modern Deep Learning?"

6 / 6 papers shown

Title
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Angelos Zavras Dimitrios Michail Xiao Xiang Zhu Begum Demir Ioannis Papoutsis VLM 86 0 0 13 Feb 2025
FOCUS: First Order Concentrated Updating Scheme Yizhou Liu Ziming Liu Jeff Gore ODL 108 1 0 21 Jan 2025
How Much Can We Forget about Data Contamination? Sebastian Bordt Suraj Srinivas Valentyn Boreiko U. V. Luxburg 45 1 0 04 Oct 2024
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes M. Kodryan E. Lobacheva M. Nakhodnov Dmitry Vetrov 39 15 0 08 Sep 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 90 98 0 13 Oct 2021
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,889 0 15 Sep 2016