ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Tackling benign nonconvexity with smoothing and stochastic gradients
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
91
8
0
18 Feb 2022
How Do Vision Transformers Work?
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
124
485
0
14 Feb 2022
PFGE: Parsimonious Fast Geometric Ensembling of DNNs
PFGE: Parsimonious Fast Geometric Ensembling of DNNs
Hao Guo
Jiyong Jin
B. Liu
FedML
72
1
0
14 Feb 2022
EvoJAX: Hardware-Accelerated Neuroevolution
EvoJAX: Hardware-Accelerated Neuroevolution
Yujin Tang
Yingtao Tian
David R Ha
102
42
0
10 Feb 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in
  Deep Learning
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
143
122
0
08 Feb 2022
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Tianyi Liu
Yan Li
Enlu Zhou
Tuo Zhao
62
1
0
07 Feb 2022
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
  Flat Regions in the Landscape Geometry
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry
Fabrizio Pittorino
Antonio Ferraro
Gabriele Perugini
Christoph Feinauer
Carlo Baldassi
R. Zecchina
263
26
0
07 Feb 2022
Anticorrelated Noise Injection for Improved Generalization
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
116
48
0
06 Feb 2022
Comparative assessment of federated and centralized machine learning
Comparative assessment of federated and centralized machine learning
Ibrahim Abdul Majeed
Sagar Kaushik
Aniruddha Bardhan
Venkata Siva Kumar Tadi
Hwang-Ki Min
K. Kumaraguru
Rajasekhara Reddy Duvvuru Muni
FedML
45
7
0
03 Feb 2022
Improving Sample Efficiency of Value Based Models Using Attention and
  Vision Transformers
Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers
Amir Ardalan Kalantari
Mohammad Amini
Sarath Chandar
Doina Precup
83
4
0
01 Feb 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
134
64
0
01 Feb 2022
Memory-Efficient Backpropagation through Large Linear Layers
Memory-Efficient Backpropagation through Large Linear Layers
Daniel Bershatsky
A. Mikhalev
A. Katrutsa
Julia Gusak
D. Merkulov
Ivan Oseledets
70
4
0
31 Jan 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
99
10
0
31 Jan 2022
Learning Fast, Learning Slow: A General Continual Learning Method based
  on Complementary Learning System
Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System
Elahe Arani
F. Sarfraz
Bahram Zonooz
CLL
175
134
0
29 Jan 2022
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision Problems
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision Problems
Yuheng Lei
Jianyu Chen
Guojian Zhan
Tao Zhang
Jiangtao Li
Jianyu Chen
Shengbo Eben Li
Sifa Zheng
OffRL
82
3
0
29 Jan 2022
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language
  Models via Efficient Large-Batch Adversarial Noise
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Minjia Zhang
U. Niranjan
Yuxiong He
51
1
0
29 Jan 2022
Hyperparameter Optimization for COVID-19 Chest X-Ray Classification
Hyperparameter Optimization for COVID-19 Chest X-Ray Classification
I. Hamdi
Muhammad Ridzuan
Mohammad Yaqub
LM&MA
199
0
0
26 Jan 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
85
5
0
23 Jan 2022
A Comprehensive Study of Vision Transformers on Dense Prediction Tasks
A Comprehensive Study of Vision Transformers on Dense Prediction Tasks
Kishaan Jeeveswaran
Senthilkumar S. Kathiresan
Arnav Varma
Omar Magdy
Bahram Zonooz
Elahe Arani
ViT
51
10
0
21 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
  Optimization Landscape
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
104
37
0
20 Jan 2022
Neighborhood Region Smoothing Regularization for Finding Flat Minima In
  Deep Neural Networks
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks
Yang Zhao
Hao Zhang
55
2
0
16 Jan 2022
Gridiron: A Technique for Augmenting Cloud Workloads with Network
  Bandwidth Requirements
Gridiron: A Technique for Augmenting Cloud Workloads with Network Bandwidth Requirements
N. Kodirov
Shane Bergsma
Syed M. Iqbal
Alan J. Hu
Ivan Beschastnikh
Margo Seltzer
25
0
0
12 Jan 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Vitaly Kurin
Alessandro De Palma
Ilya Kostrikov
Shimon Whiteson
M. P. Kumar
96
75
0
11 Jan 2022
ThreshNet: An Efficient DenseNet Using Threshold Mechanism to Reduce
  Connections
ThreshNet: An Efficient DenseNet Using Threshold Mechanism to Reduce Connections
Ruikang Ju
Ting-Yu Lin
Jia-Hao Jian
Jen-Shiun Chiang
Weida Yang
49
9
0
09 Jan 2022
Grokking: Generalization Beyond Overfitting on Small Algorithmic
  Datasets
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Alethea Power
Yuri Burda
Harrison Edwards
Igor Babuschkin
Vedant Misra
107
366
0
06 Jan 2022
Class-Incremental Continual Learning into the eXtended DER-verse
Class-Incremental Continual Learning into the eXtended DER-verse
Matteo Boschini
Lorenzo Bonicelli
Pietro Buzzega
Angelo Porrello
Simone Calderara
CLLBDL
109
142
0
03 Jan 2022
Stochastic Weight Averaging Revisited
Stochastic Weight Averaging Revisited
Hao Guo
Jiyong Jin
B. Liu
85
30
0
03 Jan 2022
Distributed Hybrid CPU and GPU training for Graph Neural Networks on
  Billion-Scale Graphs
Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs
Da Zheng
Xiang Song
Chengrun Yang
Dominique LaSalle
George Karypis
3DHGNN
90
58
0
31 Dec 2021
DRF Codes: Deep SNR-Robust Feedback Codes
DRF Codes: Deep SNR-Robust Feedback Codes
Mahdi Boloursaz Mashhadi
Deniz Gunduz
A. Perotti
B. Popović
52
11
0
22 Dec 2021
A Convergent ADMM Framework for Efficient Neural Network Training
A Convergent ADMM Framework for Efficient Neural Network Training
Junxiang Wang
Hongyi Li
Liang Zhao
62
1
0
22 Dec 2021
The effective noise of Stochastic Gradient Descent
The effective noise of Stochastic Gradient Descent
Francesca Mignacco
Pierfrancesco Urbani
69
39
0
20 Dec 2021
HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on
  Heterogeneous Medical Images
HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images
Meirui Jiang
Zirui Wang
Qi Dou
FedML
130
133
0
20 Dec 2021
An Empirical Investigation of the Role of Pre-training in Lifelong
  Learning
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta
Darshan Patil
Sarath Chandar
Emma Strubell
CLL
148
145
0
16 Dec 2021
Sharpness-Aware Minimization with Dynamic Reweighting
Sharpness-Aware Minimization with Dynamic Reweighting
Wenxuan Zhou
Fangyu Liu
Huan Zhang
Muhao Chen
AAML
48
8
0
16 Dec 2021
Visualizing the Loss Landscape of Winning Lottery Tickets
Visualizing the Loss Landscape of Winning Lottery Tickets
Robert Bain
UQCV
70
3
0
16 Dec 2021
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient
  Descent
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent
Riddhiman Bhattacharya
Tiefeng Jiang
54
0
0
14 Dec 2021
Image-to-Height Domain Translation for Synthetic Aperture Sonar
Image-to-Height Domain Translation for Synthetic Aperture Sonar
Dylan Stewart
Shawn F. Johnson
Alina Zare
66
5
0
12 Dec 2021
Effective dimension of machine learning models
Effective dimension of machine learning models
Amira Abbas
David Sutter
Alessio Figalli
Stefan Woerner
121
18
0
09 Dec 2021
Bootstrapping ViTs: Towards Liberating Vision Transformers from
  Pre-training
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training
Haofei Zhang
Jiarui Duan
Mengqi Xue
Mingli Song
Li Sun
Xiuming Zhang
ViTAI4CE
97
16
0
07 Dec 2021
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized
  Stochastic Gradient Descent
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Wei Zhang
Mingrui Liu
Yu Feng
Xiaodong Cui
Brian Kingsbury
Yuhai Tu
50
3
0
02 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
42
4
0
02 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep
  neural networks
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Yaoyu Zhang
Yuqing Li
Zhongwang Zhang
Yaoyu Zhang
Z. Xu
84
23
0
30 Nov 2021
Local Learning Matters: Rethinking Data Heterogeneity in Federated
  Learning
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
Matías Mendieta
Taojiannan Yang
Pu Wang
Minwoo Lee
Zhengming Ding
Chong Chen
FedML
147
165
0
28 Nov 2021
Federated Gaussian Process: Convergence, Automatic Personalization and
  Multi-fidelity Modeling
Federated Gaussian Process: Convergence, Automatic Personalization and Multi-fidelity Modeling
Xubo Yue
Raed Al Kontar
FedML
124
9
0
28 Nov 2021
Impact of classification difficulty on the weight matrices spectra in
  Deep Learning and application to early-stopping
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping
Xuran Meng
Jianfeng Yao
91
7
0
26 Nov 2021
Sharpness-aware Quantization for Deep Neural Networks
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
155
25
0
24 Nov 2021
Reasonable Effectiveness of Random Weighting: A Litmus Test for
  Multi-Task Learning
Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning
Baijiong Lin
Feiyang Ye
Yu Zhang
Ivor W. Tsang
109
99
0
20 Nov 2021
TransMorph: Transformer for unsupervised medical image registration
TransMorph: Transformer for unsupervised medical image registration
Junyu Chen
Eric C. Frey
Yufan He
W. Paul Segars
Ye Li
Yong Du
ViTMedIm
199
328
0
19 Nov 2021
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent:
  Convergence Guarantees and Empirical Benefits
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits
Hao Chen
Lili Zheng
Raed Al Kontar
Garvesh Raskutti
81
3
0
19 Nov 2021
Neuron-based Pruning of Deep Neural Networks with Better Generalization
  using Kronecker Factored Curvature Approximation
Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation
Abdolghani Ebrahimi
Diego Klabjan
31
4
0
16 Nov 2021
Previous
123...141516...303132
Next