ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.06015
  4. Cited By
Scalable and Practical Natural Gradient for Large-Scale Deep Learning

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

13 February 2020
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Chuan-Sheng Foo
Rio Yokota
ArXivPDFHTML

Papers citing "Scalable and Practical Natural Gradient for Large-Scale Deep Learning"

9 / 9 papers shown
Title
Eva: A General Vectorized Approximation Framework for Second-order
  Optimization
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
S. Shi
Bo-wen Li
13
1
0
04 Aug 2023
PipeFisher: Efficient Training of Large Language Models Using Pipelining
  and Fisher Information Matrices
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
33
24
0
25 Nov 2022
Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates
Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates
C. Puiu
14
2
0
16 Oct 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
S. Shi
Wei Wang
Bo-wen Li
28
10
0
30 Jun 2022
Rethinking Exponential Averaging of the Fisher
Rethinking Exponential Averaging of the Fisher
C. Puiu
16
1
0
10 Apr 2022
Invariance Learning in Deep Neural Networks with Differentiable Laplace
  Approximations
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations
Alexander Immer
Tycho F. A. van der Ouderaa
Gunnar Rätsch
Vincent Fortuin
Mark van der Wilk
BDL
31
44
0
22 Feb 2022
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
S. Shi
Lin Zhang
Bo-wen Li
24
9
0
14 Jul 2021
Whitening and second order optimization both make information in the
  dataset unusable during training, and can reduce or prevent generalization
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization
Neha S. Wadia
Daniel Duckworth
S. Schoenholz
Ethan Dyer
Jascha Narain Sohl-Dickstein
19
13
0
17 Aug 2020
A block coordinate descent optimizer for classification problems
  exploiting convexity
A block coordinate descent optimizer for classification problems exploiting convexity
Ravi G. Patel
N. Trask
Mamikon A. Gulian
E. Cyr
ODL
22
7
0
17 Jun 2020
1