ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00636
  4. Cited By
Kronecker-Factored Approximate Curvature for Modern Neural Network
  Architectures

Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures

1 November 2023
Runa Eschenhagen
Alexander Immer
Richard E. Turner
Frank Schneider
Philipp Hennig
ArXivPDFHTML

Papers citing "Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures"

20 / 20 papers shown
Title
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
81
1
0
24 Feb 2025
Position: Curvature Matrices Should Be Democratized via Linear Operators
Position: Curvature Matrices Should Be Democratized via Linear Operators
Felix Dangel
Runa Eschenhagen
Weronika Ormaniec
Andres Fernandez
Lukas Tatzel
Agustinus Kristiadi
48
3
0
31 Jan 2025
Knowledge Distillation with Adapted Weight
Sirong Wu
Xi Luo
Junjie Liu
Yuhui Deng
33
0
0
06 Jan 2025
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Lukas Tatzel
Bálint Mucsányi
Osane Hackel
Philipp Hennig
43
0
0
18 Oct 2024
Influence Functions for Scalable Data Attribution in Diffusion Models
Influence Functions for Scalable Data Attribution in Diffusion Models
Bruno Mlodozeniec
Runa Eschenhagen
Juhan Bae
Alexander Immer
David Krueger
Richard E. Turner
TDI
DiffM
75
4
0
17 Oct 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
59
23
0
17 Sep 2024
A framework for measuring the training efficiency of a neural
  architecture
A framework for measuring the training efficiency of a neural architecture
Eduardo Cueto-Mendoza
John D. Kelleher
38
0
0
12 Sep 2024
A New Perspective on Shampoo's Preconditioner
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
20
7
0
25 Jun 2024
Scalable Bayesian Learning with posteriors
Scalable Bayesian Learning with posteriors
Samuel Duffield
Kaelan Donatella
Johnathan Chiu
Phoebe Klett
Daniel Simpson
BDL
UQCV
51
1
0
31 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
74
2
0
26 May 2024
Kronecker-Factored Approximate Curvature for Physics-Informed Neural
  Networks
Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
Felix Dangel
Johannes Müller
Marius Zeinhofer
ODL
19
6
0
24 May 2024
Thermodynamic Natural Gradient Descent
Thermodynamic Natural Gradient Descent
Kaelan Donatella
Samuel Duffield
Maxwell Aifer
Denis Melanson
Gavin Crooks
Patrick J. Coles
26
3
0
22 May 2024
Training Data Attribution via Approximate Unrolled Differentiation
Training Data Attribution via Approximate Unrolled Differentiation
Juhan Bae
Wu Lin
Jonathan Lorraine
Roger C. Grosse
TDI
MU
38
12
0
20 May 2024
Variational Stochastic Gradient Descent for Deep Neural Networks
Variational Stochastic Gradient Descent for Deep Neural Networks
Haotian Chen
Anna Kuzina
Babak Esmaeili
Jakub M. Tomczak
39
0
0
09 Apr 2024
The LLM Surgeon
The LLM Surgeon
Tycho F. A. van der Ouderaa
Markus Nagel
M. V. Baalen
Yuki Markus Asano
Tijmen Blankevoort
19
14
0
28 Dec 2023
Structured Inverse-Free Natural Gradient: Memory-Efficient &
  Numerically-Stable KFAC
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
Wu Lin
Felix Dangel
Runa Eschenhagen
Kirill Neklyudov
Agustinus Kristiadi
Richard E. Turner
Alireza Makhzani
16
3
0
09 Dec 2023
Gradient Descent on Neurons and its Link to Approximate Second-Order
  Optimization
Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization
Frederik Benzing
ODL
35
23
0
28 Jan 2022
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Mohammad Emtiyaz Khan
Didrik Nielsen
Voot Tangkaratt
Wu Lin
Y. Gal
Akash Srivastava
ODL
74
266
0
13 Jun 2018
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
282
39,170
0
01 Sep 2014
Improving neural networks by preventing co-adaptation of feature
  detectors
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
243
7,620
0
03 Jul 2012
1