FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information

21 May 2024

Papers citing "FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information"

8 / 8 papers shown

Title
Fine-Tuning TransMorph with Gradient Correlation for Anatomical Alignment Lukas Förner Kartikay Tehlan Thomas Wendler MedIm 24 0 0 31 Dec 2024
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR Sourav Banerjee Ayushi Agarwal Promila Ghosh 74 2 0 24 Nov 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective Wu Lin Felix Dangel Runa Eschenhagen Juhan Bae Richard E. Turner Alireza Makhzani ODL 44 12 0 05 Feb 2024
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be Frederik Kunstner Jacques Chen J. Lavington Mark W. Schmidt 38 66 0 27 Apr 2023
MetNet: A Neural Weather Model for Precipitation Forecasting C. Sønderby L. Espeholt Jonathan Heek Mostafa Dehghani Avital Oliver Tim Salimans Shreya Agrawal Jason Hickey Nal Kalchbrenner AI4Cl 212 268 0 24 Mar 2020
A Simple Convergence Proof of Adam and Adagrad Alexandre Défossez Léon Bottou Francis R. Bach Nicolas Usunier 56 143 0 05 Mar 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 223 4,424 0 23 Jan 2020
MCMC using Hamiltonian dynamics Radford M. Neal 130 3,260 0 09 Jun 2012