ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Smoothness Analysis of Adversarial Training
Smoothness Analysis of Adversarial Training
Sekitoshi Kanai
Masanori Yamada
Hiroshi Takahashi
Yuki Yamanaka
Yasutoshi Ida
AAML
95
6
0
02 Mar 2021
Acceleration via Fractal Learning Rate Schedules
Acceleration via Fractal Learning Rate Schedules
Naman Agarwal
Surbhi Goel
Cyril Zhang
76
18
0
01 Mar 2021
Siamese Labels Auxiliary Learning
Siamese Labels Auxiliary Learning
Wenrui Gan
Zhulin Liu
Chong Chen
Tong Zhang
35
2
0
27 Feb 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of
  Stability
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
131
279
0
26 Feb 2021
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton
Wesley J. Maddox
Sanae Lotfi
A. Wilson
UQCV
126
70
0
25 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
104
80
0
24 Feb 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix
  Factorization
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
65
13
0
24 Feb 2021
Inductive Bias of Multi-Channel Linear Convolutional Networks with
  Bounded Weight Norm
Inductive Bias of Multi-Channel Linear Convolutional Networks with Bounded Weight Norm
Meena Jagadeesan
Ilya P. Razenshteyn
Suriya Gunasekar
113
21
0
24 Feb 2021
The Promises and Pitfalls of Deep Kernel Learning
The Promises and Pitfalls of Deep Kernel Learning
Sebastian W. Ober
C. Rasmussen
Mark van der Wilk
UQCVBDL
82
109
0
24 Feb 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
  of Deep Neural Networks
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon
Jeongseop Kim
Hyunseong Park
I. Choi
124
291
0
23 Feb 2021
The Uncanny Similarity of Recurrence and Depth
The Uncanny Similarity of Recurrence and Depth
Avi Schwarzschild
Arjun Gupta
Amin Ghiasi
Micah Goldblum
Tom Goldstein
83
10
0
22 Feb 2021
Multi-View Feature Representation for Dialogue Generation with
  Bidirectional Distillation
Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation
Shaoxiong Feng
Xuancheng Ren
Kan Li
Xu Sun
62
11
0
22 Feb 2021
Non-Convex Optimization with Spectral Radius Regularization
Non-Convex Optimization with Spectral Radius Regularization
Adam Sandler
Diego Klabjan
Yuan Luo
ODL
45
1
0
22 Feb 2021
Formal Language Theory Meets Modern NLP
Formal Language Theory Meets Modern NLP
William Merrill
AI4CENAI
112
13
0
19 Feb 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and
  Stable Convergence
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence
Karl Bäckström
Ivan Walulya
Marina Papatriantafilou
P. Tsigas
67
5
0
17 Feb 2021
SWAD: Domain Generalization by Seeking Flat Minima
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
311
460
0
17 Feb 2021
Generating Structured Adversarial Attacks Using Frank-Wolfe Method
Generating Structured Adversarial Attacks Using Frank-Wolfe Method
Ehsan Kazemi
Thomas Kerdreux
Liquang Wang
AAMLDiffM
48
1
0
15 Feb 2021
Cockpit: A Practical Debugging Tool for the Training of Deep Neural
  Networks
Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
Frank Schneider
Felix Dangel
Philipp Hennig
74
10
0
12 Feb 2021
Noisy Recurrent Neural Networks
Noisy Recurrent Neural Networks
Soon Hoe Lim
N. Benjamin Erichson
Liam Hodgkinson
Michael W. Mahoney
93
54
0
09 Feb 2021
Consensus Control for Decentralized Deep Learning
Consensus Control for Decentralized Deep Learning
Lingjing Kong
Tao R. Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
53
79
0
09 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize
  Criticality
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality
Courtney Paquette
Kiwon Lee
Fabian Pedregosa
Elliot Paquette
59
35
0
08 Feb 2021
Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise
Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise
Xingyu Wang
Sewoong Oh
C. Rhee
75
17
0
08 Feb 2021
Adversarial Training Makes Weight Loss Landscape Sharper in Logistic
  Regression
Adversarial Training Makes Weight Loss Landscape Sharper in Logistic Regression
Masanori Yamada
Sekitoshi Kanai
Tomoharu Iwata
Tomokatsu Takahashi
Yuki Yamanaka
Hiroshi Takahashi
Atsutoshi Kumagai
AAML
124
9
0
05 Feb 2021
Horizontally Fused Training Array: An Effective Hardware Utilization
  Squeezer for Training Novel Deep Learning Models
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models
Shang Wang
Peiming Yang
Yuxuan Zheng
Xuelong Li
Gennady Pekhimenko
82
22
0
03 Feb 2021
Information-Theoretic Generalization Bounds for Stochastic Gradient
  Descent
Information-Theoretic Generalization Bounds for Stochastic Gradient Descent
Gergely Neu
Gintare Karolina Dziugaite
Mahdi Haghifam
Daniel M. Roy
128
90
0
01 Feb 2021
Exploring the Geometry and Topology of Neural Network Loss Landscapes
Exploring the Geometry and Topology of Neural Network Loss Landscapes
Stefan Horoi
Je-chun Huang
Bastian Rieck
Guillaume Lajoie
Guy Wolf
Smita Krishnaswamy
45
13
0
31 Jan 2021
Modelling Sovereign Credit Ratings: Evaluating the Accuracy and Driving
  Factors using Machine Learning Techniques
Modelling Sovereign Credit Ratings: Evaluating the Accuracy and Driving Factors using Machine Learning Techniques
B. Overes
Michel van der Wel
17
6
0
29 Jan 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
62
204
0
28 Jan 2021
cGANs for Cartoon to Real-life Images
cGANs for Cartoon to Real-life Images
P. Rajput
Kanya Satis
Sonnya Dellarosa
Wenxuan Huang
Obinna Agba
GAN
55
2
0
24 Jan 2021
Predicting the Mechanical Properties of Biopolymer Gels Using Neural
  Networks Trained on Discrete Fiber Network Data
Predicting the Mechanical Properties of Biopolymer Gels Using Neural Networks Trained on Discrete Fiber Network Data
Yue Leng
Vahidullah Tac
S. Calve
A. B. Tepole
86
32
0
23 Jan 2021
A shallow neural model for relation prediction
A shallow neural model for relation prediction
Caglar Demir
Diego Moussallem
A. N. Ngomo
45
11
0
22 Jan 2021
Robustness to Augmentations as a Generalization metric
Robustness to Augmentations as a Generalization metric
Sumukh K Aithal
D. Kashyap
Natarajan Subramanyam
OOD
36
18
0
16 Jan 2021
BN-invariant sharpness regularizes the training model to better
  generalization
BN-invariant sharpness regularizes the training model to better generalization
Mingyang Yi
Huishuai Zhang
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
128
3
0
08 Jan 2021
Accelerating Training of Batch Normalization: A Manifold Perspective
Accelerating Training of Batch Normalization: A Manifold Perspective
Mingyang Yi
24
3
0
08 Jan 2021
A spin-glass model for the loss surfaces of generative adversarial
  networks
A spin-glass model for the loss surfaces of generative adversarial networks
Nicholas P. Baskerville
J. Keating
F. Mezzadri
J. Najnudel
GAN
88
12
0
07 Jan 2021
Topological obstructions in neural networks learning
Topological obstructions in neural networks learning
S. Barannikov
Daria Voronkova
I. Trofimov
Alexander Korotin
Grigorii Sotnikov
Evgeny Burnaev
39
6
0
31 Dec 2020
Optimizing Deeper Transformers on Small Datasets
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
109
69
0
30 Dec 2020
Crossover-SGD: A gossip-based communication in distributed deep learning
  for alleviating large mini-batch problem and enhancing scalability
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Sangho Yeo
Minho Bae
Minjoong Jeong
Oh-Kyoung Kwon
Sangyoon Oh
57
3
0
30 Dec 2020
Mathematical Models of Overparameterized Neural Networks
Mathematical Models of Overparameterized Neural Networks
Cong Fang
Hanze Dong
Tong Zhang
181
23
0
27 Dec 2020
Understanding Decoupled and Early Weight Decay
Understanding Decoupled and Early Weight Decay
Johan Bjorck
Kilian Q. Weinberger
Carla P. Gomes
61
25
0
27 Dec 2020
Recent advances in deep learning theory
Recent advances in deep learning theory
Fengxiang He
Dacheng Tao
AI4CE
130
51
0
20 Dec 2020
Combating Mode Collapse in GAN training: An Empirical Analysis using
  Hessian Eigenvalues
Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues
Ricard Durall
Avraam Chatzimichailidis
P. Labus
J. Keuper
GAN
77
62
0
17 Dec 2020
Study on the Large Batch Size Training of Neural Networks Based on the
  Second Order Gradient
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
35
10
0
16 Dec 2020
DeepLesionBrain: Towards a broader deep-learning generalization for
  multiple sclerosis lesion segmentation
DeepLesionBrain: Towards a broader deep-learning generalization for multiple sclerosis lesion segmentation
R. A. Kamraoui
Vinh-Thong Ta
T. Tourdias
Boris Mansencal
J. V. Manjón
Pierrick Coupé
OOD
120
54
0
14 Dec 2020
Warm Starting CMA-ES for Hyperparameter Optimization
Warm Starting CMA-ES for Hyperparameter Optimization
Masahiro Nomura
Shuhei Watanabe
Youhei Akimoto
Yoshihiko Ozaki
Masaki Onishi
93
43
0
13 Dec 2020
Enhance Convolutional Neural Networks with Noise Incentive Block
Enhance Convolutional Neural Networks with Noise Incentive Block
Menghan Xia
Yi Wang
Chu Han
T. Wong
40
1
0
09 Dec 2020
Generalization bounds for deep learning
Generalization bounds for deep learning
Guillermo Valle Pérez
A. Louis
BDL
82
45
0
07 Dec 2020
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and
  its Applications to Regularization
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
89
44
0
07 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
  Descent
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
149
39
0
07 Dec 2020
Why Unsupervised Deep Networks Generalize
Why Unsupervised Deep Networks Generalize
Anita de Mello Koch
E. Koch
R. Koch
OOD
44
8
0
07 Dec 2020
Previous
123...181920...303132
Next