ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.06015
  4. Cited By
Scalable and Practical Natural Gradient for Large-Scale Deep Learning

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
13 February 2020
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Chuan-Sheng Foo
Rio Yokota
ArXiv (abs)PDFHTML

Papers citing "Scalable and Practical Natural Gradient for Large-Scale Deep Learning"

15 / 15 papers shown
Beyond the Mean: Fisher-Orthogonal Projection for Natural Gradient Descent in Large Batch Training
Beyond the Mean: Fisher-Orthogonal Projection for Natural Gradient Descent in Large Batch Training
Yishun Lu
Wesley Armour
ODL
529
2
0
19 Aug 2025
Jorge: Approximate Preconditioning for GPU-efficient Second-order
  Optimization
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization
Siddharth Singh
Zack Sating
A. Bhatele
ODL
223
1
0
18 Oct 2023
Eva: A General Vectorized Approximation Framework for Second-order
  Optimization
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
Shaoshuai Shi
Yue Liu
260
1
0
04 Aug 2023
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Kazuki Osawa
Satoki Ishikawa
Rio Yokota
Shigang Li
Torsten Hoefler
ODL
201
19
0
08 May 2023
Natural Gradient Methods: Perspectives, Efficient-Scalable
  Approximations, and Analysis
Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis
Rajesh Shrestha
ODL
213
11
0
06 Mar 2023
PipeFisher: Efficient Training of Large Language Models Using Pipelining
  and Fisher Information Matrices
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information MatricesConference on Machine Learning and Systems (MLSys), 2022
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
245
36
0
25 Nov 2022
Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates
Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates
C. Puiu
237
2
0
16 Oct 2022
Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear
  Algebra
Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear AlgebraIdeal (IDEAL), 2022
C. Puiu
303
3
0
30 Jun 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed PreconditioningIEEE Transactions on Cloud Computing (IEEE TCC), 2022
Lin Zhang
Shaoshuai Shi
Wei Wang
Yue Liu
248
12
0
30 Jun 2022
Rethinking Exponential Averaging of the Fisher
Rethinking Exponential Averaging of the Fisher
C. Puiu
233
3
0
10 Apr 2022
Invariance Learning in Deep Neural Networks with Differentiable Laplace
  Approximations
Invariance Learning in Deep Neural Networks with Differentiable Laplace ApproximationsNeural Information Processing Systems (NeurIPS), 2022
Alexander Immer
Tycho F. A. van der Ouderaa
Gunnar Rätsch
Vincent Fortuin
Mark van der Wilk
BDL
429
54
0
22 Feb 2022
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
Shaoshuai Shi
Lin Zhang
Yue Liu
304
10
0
14 Jul 2021
Whitening and second order optimization both make information in the
  dataset unusable during training, and can reduce or prevent generalization
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization
Neha S. Wadia
Daniel Duckworth
S. Schoenholz
Ethan Dyer
Jascha Narain Sohl-Dickstein
492
18
0
17 Aug 2020
A block coordinate descent optimizer for classification problems
  exploiting convexity
A block coordinate descent optimizer for classification problems exploiting convexity
Ravi G. Patel
N. Trask
Mamikon A. Gulian
E. Cyr
ODL
218
8
0
17 Jun 2020
Addressing Catastrophic Forgetting in Few-Shot Problems
Addressing Catastrophic Forgetting in Few-Shot ProblemsInternational Conference on Machine Learning (ICML), 2020
Pauching Yap
H. Ritter
David Barber
CLLBDL
454
20
0
30 Apr 2020
1
Page 1 of 1