Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
91
8
0
18 Feb 2022
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
124
485
0
14 Feb 2022
PFGE: Parsimonious Fast Geometric Ensembling of DNNs
Hao Guo
Jiyong Jin
B. Liu
FedML
72
1
0
14 Feb 2022
EvoJAX: Hardware-Accelerated Neuroevolution
Yujin Tang
Yingtao Tian
David R Ha
102
42
0
10 Feb 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
143
122
0
08 Feb 2022
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Tianyi Liu
Yan Li
Enlu Zhou
Tuo Zhao
62
1
0
07 Feb 2022
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry
Fabrizio Pittorino
Antonio Ferraro
Gabriele Perugini
Christoph Feinauer
Carlo Baldassi
R. Zecchina
263
26
0
07 Feb 2022
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
116
48
0
06 Feb 2022
Comparative assessment of federated and centralized machine learning
Ibrahim Abdul Majeed
Sagar Kaushik
Aniruddha Bardhan
Venkata Siva Kumar Tadi
Hwang-Ki Min
K. Kumaraguru
Rajasekhara Reddy Duvvuru Muni
FedML
45
7
0
03 Feb 2022
Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers
Amir Ardalan Kalantari
Mohammad Amini
Sarath Chandar
Doina Precup
83
4
0
01 Feb 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
134
64
0
01 Feb 2022
Memory-Efficient Backpropagation through Large Linear Layers
Daniel Bershatsky
A. Mikhalev
A. Katrutsa
Julia Gusak
D. Merkulov
Ivan Oseledets
70
4
0
31 Jan 2022
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
99
10
0
31 Jan 2022
Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System
Elahe Arani
F. Sarfraz
Bahram Zonooz
CLL
175
134
0
29 Jan 2022
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision Problems
Yuheng Lei
Jianyu Chen
Guojian Zhan
Tao Zhang
Jiangtao Li
Jianyu Chen
Shengbo Eben Li
Sifa Zheng
OffRL
82
3
0
29 Jan 2022
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Minjia Zhang
U. Niranjan
Yuxiong He
51
1
0
29 Jan 2022
Hyperparameter Optimization for COVID-19 Chest X-Ray Classification
I. Hamdi
Muhammad Ridzuan
Mohammad Yaqub
LM&MA
199
0
0
26 Jan 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
85
5
0
23 Jan 2022
A Comprehensive Study of Vision Transformers on Dense Prediction Tasks
Kishaan Jeeveswaran
Senthilkumar S. Kathiresan
Arnav Varma
Omar Magdy
Bahram Zonooz
Elahe Arani
ViT
51
10
0
21 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
104
37
0
20 Jan 2022
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks
Yang Zhao
Hao Zhang
55
2
0
16 Jan 2022
Gridiron: A Technique for Augmenting Cloud Workloads with Network Bandwidth Requirements
N. Kodirov
Shane Bergsma
Syed M. Iqbal
Alan J. Hu
Ivan Beschastnikh
Margo Seltzer
25
0
0
12 Jan 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Vitaly Kurin
Alessandro De Palma
Ilya Kostrikov
Shimon Whiteson
M. P. Kumar
96
75
0
11 Jan 2022
ThreshNet: An Efficient DenseNet Using Threshold Mechanism to Reduce Connections
Ruikang Ju
Ting-Yu Lin
Jia-Hao Jian
Jen-Shiun Chiang
Weida Yang
49
9
0
09 Jan 2022
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Alethea Power
Yuri Burda
Harrison Edwards
Igor Babuschkin
Vedant Misra
107
366
0
06 Jan 2022
Class-Incremental Continual Learning into the eXtended DER-verse
Matteo Boschini
Lorenzo Bonicelli
Pietro Buzzega
Angelo Porrello
Simone Calderara
CLL
BDL
109
142
0
03 Jan 2022
Stochastic Weight Averaging Revisited
Hao Guo
Jiyong Jin
B. Liu
85
30
0
03 Jan 2022
Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs
Da Zheng
Xiang Song
Chengrun Yang
Dominique LaSalle
George Karypis
3DH
GNN
90
58
0
31 Dec 2021
DRF Codes: Deep SNR-Robust Feedback Codes
Mahdi Boloursaz Mashhadi
Deniz Gunduz
A. Perotti
B. Popović
52
11
0
22 Dec 2021
A Convergent ADMM Framework for Efficient Neural Network Training
Junxiang Wang
Hongyi Li
Liang Zhao
62
1
0
22 Dec 2021
The effective noise of Stochastic Gradient Descent
Francesca Mignacco
Pierfrancesco Urbani
69
39
0
20 Dec 2021
HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images
Meirui Jiang
Zirui Wang
Qi Dou
FedML
130
133
0
20 Dec 2021
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta
Darshan Patil
Sarath Chandar
Emma Strubell
CLL
148
145
0
16 Dec 2021
Sharpness-Aware Minimization with Dynamic Reweighting
Wenxuan Zhou
Fangyu Liu
Huan Zhang
Muhao Chen
AAML
48
8
0
16 Dec 2021
Visualizing the Loss Landscape of Winning Lottery Tickets
Robert Bain
UQCV
70
3
0
16 Dec 2021
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent
Riddhiman Bhattacharya
Tiefeng Jiang
54
0
0
14 Dec 2021
Image-to-Height Domain Translation for Synthetic Aperture Sonar
Dylan Stewart
Shawn F. Johnson
Alina Zare
66
5
0
12 Dec 2021
Effective dimension of machine learning models
Amira Abbas
David Sutter
Alessio Figalli
Stefan Woerner
121
18
0
09 Dec 2021
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training
Haofei Zhang
Jiarui Duan
Mengqi Xue
Mingli Song
Li Sun
Xiuming Zhang
ViT
AI4CE
97
16
0
07 Dec 2021
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Wei Zhang
Mingrui Liu
Yu Feng
Xiaodong Cui
Brian Kingsbury
Yuhai Tu
50
3
0
02 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
42
4
0
02 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Yaoyu Zhang
Yuqing Li
Zhongwang Zhang
Yaoyu Zhang
Z. Xu
84
23
0
30 Nov 2021
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
Matías Mendieta
Taojiannan Yang
Pu Wang
Minwoo Lee
Zhengming Ding
Chong Chen
FedML
147
165
0
28 Nov 2021
Federated Gaussian Process: Convergence, Automatic Personalization and Multi-fidelity Modeling
Xubo Yue
Raed Al Kontar
FedML
124
9
0
28 Nov 2021
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping
Xuran Meng
Jianfeng Yao
91
7
0
26 Nov 2021
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
155
25
0
24 Nov 2021
Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning
Baijiong Lin
Feiyang Ye
Yu Zhang
Ivor W. Tsang
109
99
0
20 Nov 2021
TransMorph: Transformer for unsupervised medical image registration
Junyu Chen
Eric C. Frey
Yufan He
W. Paul Segars
Ye Li
Yong Du
ViT
MedIm
199
328
0
19 Nov 2021
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits
Hao Chen
Lili Zheng
Raed Al Kontar
Garvesh Raskutti
81
3
0
19 Nov 2021
Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation
Abdolghani Ebrahimi
Diego Klabjan
31
4
0
16 Nov 2021
Previous
1
2
3
...
14
15
16
...
30
31
32
Next