Scalable and Practical Natural Gradient for Large-Scale Deep Learning

13 February 2020

Papers citing "Scalable and Practical Natural Gradient for Large-Scale Deep Learning"

9 / 9 papers shown

Title
Eva: A General Vectorized Approximation Framework for Second-order Optimization Lin Zhang S. Shi Bo-wen Li 13 1 0 04 Aug 2023
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices Kazuki Osawa Shigang Li Torsten Hoefler AI4CE 33 24 0 25 Nov 2022
Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates C. Puiu 14 2 0 16 Oct 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning Lin Zhang S. Shi Wei Wang Bo-wen Li 28 10 0 30 Jun 2022
Rethinking Exponential Averaging of the Fisher C. Puiu 16 1 0 10 Apr 2022
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations Alexander Immer Tycho F. A. van der Ouderaa Gunnar Rätsch Vincent Fortuin Mark van der Wilk BDL 31 44 0 22 Feb 2022
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks S. Shi Lin Zhang Bo-wen Li 24 9 0 14 Jul 2021
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization Neha S. Wadia Daniel Duckworth S. Schoenholz Ethan Dyer Jascha Narain Sohl-Dickstein 19 13 0 17 Aug 2020
A block coordinate descent optimizer for classification problems exploiting convexity Ravi G. Patel N. Trask Mamikon A. Gulian E. Cyr ODL 22 7 0 17 Jun 2020