Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.09092
Cited By
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
16 June 2020
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training"
10 / 10 papers shown
Title
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
36
3
0
10 Jan 2025
A Cost-Aware Approach to Adversarial Robustness in Neural Networks
Charles Meyers
Mohammad Reza Saleh Sedghpour
Tommy Löfstedt
Erik Elmroth
OOD
AAML
31
0
0
11 Sep 2024
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan-Bo Wang
Sijie Cheng
Xianyuan Zhan
Xiangang Li
Sen Song
Yang Liu
ALM
13
227
0
20 Sep 2023
Universal characteristics of deep neural network loss surfaces from random matrix theory
Nicholas P. Baskerville
J. Keating
F. Mezzadri
J. Najnudel
Diego Granziol
22
4
0
17 May 2022
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
153
233
0
04 Mar 2020
Cleaning large correlation matrices: tools from random matrix theory
J. Bun
J. Bouchaud
M. Potters
27
262
0
25 Oct 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,886
0
15 Sep 2016
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
175
1,184
0
30 Nov 2014
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark W. Schmidt
Francis R. Bach
116
259
0
10 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
99
571
0
08 Dec 2012
1