ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Papaya: Practical, Private, and Scalable Federated Learning
Papaya: Practical, Private, and Scalable Federated Learning
Dzmitry Huba
John Nguyen
Kshitiz Malik
Ruiyu Zhu
Michael G. Rabbat
...
H. Srinivas
Kaikai Wang
Anthony Shoumikhin
Jesik Min
Mani Malek
FedML
152
141
0
08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary
  regime
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
97
4
0
07 Nov 2021
Dropout in Training Neural Networks: Flatness of Solution and Noise
  Structure
Dropout in Training Neural Networks: Flatness of Solution and Noise Structure
Zhongwang Zhang
Hanxu Zhou
Zhi-Qin John Xu
ODL
63
2
0
01 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
83
15
0
01 Nov 2021
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both
  Homophily and Heterophily
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily
Lun Du
Xiaozhou Shi
Qiang Fu
Xiaojun Ma
Hengyu Liu
Shi Han
Dongmei Zhang
127
114
0
29 Oct 2021
CAP: Co-Adversarial Perturbation on Weights and Features for Improving
  Generalization of Graph Neural Networks
CAP: Co-Adversarial Perturbation on Weights and Features for Improving Generalization of Graph Neural Networks
Hao Xue
Kaixiong Zhou
Tianlong Chen
Kai Guo
Helen Zhou
Yi Chang
Xin Wang
AAML
71
15
0
28 Oct 2021
Masked LARk: Masked Learning, Aggregation and Reporting worKflow
Masked LARk: Masked Learning, Aggregation and Reporting worKflow
Joseph J. Pfeiffer
Denis Xavier Charles
Davis Gilton
Young Hun Jung
Mehul Parsana
Erik Anderson
74
11
0
27 Oct 2021
Multilayer Lookahead: a Nested Version of Lookahead
Multilayer Lookahead: a Nested Version of Lookahead
Denys Pushkin
Luis Barba
97
1
0
27 Oct 2021
RoMA: Robust Model Adaptation for Offline Model-based Optimization
RoMA: Robust Model Adaptation for Offline Model-based Optimization
Sihyun Yu
SungSoo Ahn
Le Song
Jinwoo Shin
OffRL
95
36
0
27 Oct 2021
Optimizing Information-theoretical Generalization Bounds via Anisotropic
  Noise in SGLD
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD
Bohan Wang
Huishuai Zhang
Jieyu Zhang
Qi Meng
Wei Chen
Tie-Yan Liu
22
1
0
26 Oct 2021
Stable Anderson Acceleration for Deep Learning
Stable Anderson Acceleration for Deep Learning
Massimiliano Lupo Pasini
Junqi Yin
Viktor Reshniak
M. Stoyanov
59
4
0
26 Oct 2021
Generalized Resubstitution for Classification Error Estimation
Generalized Resubstitution for Classification Error Estimation
P. Ghane
U. Braga-Neto
16
2
0
23 Oct 2021
Feature Learning and Signal Propagation in Deep Neural Networks
Feature Learning and Signal Propagation in Deep Neural Networks
Yizhang Lou
Chris Mingard
Yoonsoo Nam
Soufiane Hayou
MDE
82
18
0
22 Oct 2021
Boosting Resource-Constrained Federated Learning Systems with Guessed Updates
Boosting Resource-Constrained Federated Learning Systems with Guessed Updates
Mohamed Yassine Boukhari
Akash Dhasade
Anne-Marie Kermarrec
Rafael Pires
Othmane Safsafi
Rishi Sharma
FedML
75
0
0
21 Oct 2021
Test time Adaptation through Perturbation Robustness
Test time Adaptation through Perturbation Robustness
Prabhu Teja Sivaprasad
Franccois Fleuret
TTAOOD
69
34
0
19 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
182
104
0
16 Oct 2021
Trade-offs of Local SGD at Scale: An Empirical Study
Trade-offs of Local SGD at Scale: An Empirical Study
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
FedML
86
18
0
15 Oct 2021
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
121
105
0
13 Oct 2021
The Role of Permutation Invariance in Linear Mode Connectivity of Neural
  Networks
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
R. Entezari
Hanie Sedghi
O. Saukh
Behnam Neyshabur
MoMe
102
238
0
12 Oct 2021
Not all noise is accounted equally: How differentially private learning
  benefits from large sampling rates
Not all noise is accounted equally: How differentially private learning benefits from large sampling rates
Friedrich Dörmann
Osvald Frisk
L. Andersen
Christian Fischer Pedersen
FedML
98
25
0
12 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic
  Differential Equations
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
96
8
0
11 Oct 2021
Observations on K-image Expansion of Image-Mixing Augmentation for
  Classification
Observations on K-image Expansion of Image-Mixing Augmentation for Classification
Joonhyun Jeong
Sungmin Cha
Jongwon Choi
Sangdoo Yun
Taesup Moon
Y. Yoo
VLM
90
7
0
08 Oct 2021
Does Momentum Change the Implicit Regularization on Separable Data?
Does Momentum Change the Implicit Regularization on Separable Data?
Bohan Wang
Qi Meng
Huishuai Zhang
Ruoyu Sun
Wei Chen
Zhirui Ma
Tie-Yan Liu
99
18
0
08 Oct 2021
Efficient Sharpness-aware Minimization for Improved Training of Neural
  Networks
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Jiawei Du
Hanshu Yan
Jiashi Feng
Qiufeng Wang
Liangli Zhen
Rick Siow Mong Goh
Vincent Y. F. Tan
AAML
177
135
0
07 Oct 2021
Label Noise in Adversarial Training: A Novel Perspective to Study Robust
  Overfitting
Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting
Chengyu Dong
Liyuan Liu
Jingbo Shang
NoLaAAML
119
20
0
07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic
  Bounds and Implications
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedMLMLT
124
26
0
07 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in
  Generalization
Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil
Raphael Gontijo-Lopes
Rebecca Roelofs
107
30
0
06 Oct 2021
Perturbated Gradients Updating within Unit Space for Deep Learning
Perturbated Gradients Updating within Unit Space for Deep Learning
Ching-Hsun Tseng
Liu Cheng
Shin-Jye Lee
Xiaojun Zeng
111
5
0
01 Oct 2021
Accelerating Encrypted Computing on Intel GPUs
Accelerating Encrypted Computing on Intel GPUs
Yujia Zhai
Mohannad Ibrahim
Yiqin Qiu
Fabian Boemer
Zizhong Chen
Alexey Titov
Alexander Lyashevsky
130
26
0
29 Sep 2021
Second-Order Neural ODE Optimizer
Second-Order Neural ODE Optimizer
Guan-Horng Liu
T. Chen
Evangelos A. Theodorou
77
15
0
29 Sep 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
173
76
0
29 Sep 2021
Scalable deeper graph neural networks for high-performance materials
  property prediction
Scalable deeper graph neural networks for high-performance materials property prediction
Sadman Sadeed Omee
Steph-Yves M. Louis
Nihang Fu
Lai Wei
Sourin Dey
Rongzhi Dong
Qinyang Li
Jianjun Hu
132
77
0
25 Sep 2021
Towards Generalized and Incremental Few-Shot Object Detection
Towards Generalized and Incremental Few-Shot Object Detection
Yiting Li
H. Zhu
Jun Ma
C. Teo
Chen Xiang
P. Vadakkepat
T. Lee
CLLObjD
66
9
0
23 Sep 2021
Patch-based Medical Image Segmentation using Matrix Product State Tensor
  Networks
Patch-based Medical Image Segmentation using Matrix Product State Tensor Networks
Raghavendra Selvan
Erik Dam
Soren Alexander Flensborg
Jens Petersen
MedIm
98
2
0
15 Sep 2021
DHA: End-to-End Joint Optimization of Data Augmentation Policy,
  Hyper-parameter and Architecture
DHA: End-to-End Joint Optimization of Data Augmentation Policy, Hyper-parameter and Architecture
Kaichen Zhou
Lanqing Hong
Shuailiang Hu
Fengwei Zhou
Binxin Ru
Jiashi Feng
Zhenguo Li
84
10
0
13 Sep 2021
Raise a Child in Large Language Model: Towards Effective and
  Generalizable Fine-tuning
Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
Runxin Xu
Fuli Luo
Zhiyuan Zhang
Chuanqi Tan
Baobao Chang
Songfang Huang
Fei Huang
LRM
193
190
0
13 Sep 2021
A Continuous Optimisation Benchmark Suite from Neural Network Regression
A Continuous Optimisation Benchmark Suite from Neural Network Regression
K. Malan
C. Cleghorn
ODL
39
1
0
12 Sep 2021
MLReal: Bridging the gap between training on synthetic data and real
  data applications in machine learning
MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning
T. Alkhalifah
Hanchen Wang
O. Ovcharenko
OOD
99
68
0
11 Sep 2021
Adversarial Parameter Defense by Multi-Step Risk Minimization
Adversarial Parameter Defense by Multi-Step Risk Minimization
Zhiyuan Zhang
Ruixuan Luo
Xuancheng Ren
Qi Su
Liangyou Li
Xu Sun
AAML
64
6
0
07 Sep 2021
Deep Convolutional Neural Networks Predict Elasticity Tensors and their
  Bounds in Homogenization
Deep Convolutional Neural Networks Predict Elasticity Tensors and their Bounds in Homogenization
B. Eidel
3DV
35
2
0
04 Sep 2021
How to Inject Backdoors with Better Consistency: Logit Anchoring on
  Clean Data
How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data
Zhiyuan Zhang
Lingjuan Lyu
Weiqiang Wang
Lichao Sun
Xu Sun
86
36
0
03 Sep 2021
The Impact of Reinitialization on Generalization in Convolutional Neural
  Networks
The Impact of Reinitialization on Generalization in Convolutional Neural Networks
Ibrahim Alabdulmohsin
Hartmut Maennel
Daniel Keysers
AI4CE
61
21
0
01 Sep 2021
HAT4RD: Hierarchical Adversarial Training for Rumor Detection on Social
  Media
HAT4RD: Hierarchical Adversarial Training for Rumor Detection on Social Media
Shiwen Ni
Jiawen Li
Hung-Yu kao
72
7
0
29 Aug 2021
DropAttack: A Masked Weight Adversarial Training Method to Improve
  Generalization of Neural Networks
DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks
Shiwen Ni
Jiawen Li
Hung-Yu kao
AAML
61
4
0
29 Aug 2021
Re-using Adversarial Mask Discriminators for Test-time Training under
  Distribution Shifts
Re-using Adversarial Mask Discriminators for Test-time Training under Distribution Shifts
Gabriele Valvano
Andrea Leo
Sotirios A. Tsaftaris
74
6
0
26 Aug 2021
Measurement of Hybrid Rocket Solid Fuel Regression Rate for a Slab
  Burner using Deep Learning
Measurement of Hybrid Rocket Solid Fuel Regression Rate for a Slab Burner using Deep Learning
Gabriel Surina
G. Georgalis
Siddhant S. Aphale
A. Patra
P. DesJardin
8
11
0
25 Aug 2021
Shift-Curvature, SGD, and Generalization
Shift-Curvature, SGD, and Generalization
Arwen V. Bradley
C. Gomez-Uribe
Manish Reddy Vuyyuru
62
3
0
21 Aug 2021
Learning from Images: Proactive Caching with Parallel Convolutional
  Neural Networks
Learning from Images: Proactive Caching with Parallel Convolutional Neural Networks
Yantong Wang
Ye Hu
Zhaohui Yang
Walid Saad
Kai‐Kit Wong
V. Friderikos
134
4
0
15 Aug 2021
Implicit Regularization of Bregman Proximal Point Algorithm and Mirror
  Descent on Separable Data
Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
Yan Li
Caleb Ju
Ethan X. Fang
T. Zhao
69
9
0
15 Aug 2021
Logit Attenuating Weight Normalization
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
75
1
0
12 Aug 2021
Previous
123...151617...303132
Next