Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
A Neural Network Based Choice Model for Assortment Optimization
Hanrui Wang
Zhongze Cai
Xiaocheng Li
Kalyan Talluri
42
2
0
10 Aug 2023
G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
Xingyu Li
Bo Tang
AAML
46
0
0
07 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
61
3
0
06 Aug 2023
Model Provenance via Model DNA
Xin Mu
Yu Wang
Yehong Zhang
Jiaqi Zhang
Haibo Wang
Yang Xiang
Yue Yu
SyDa
58
0
0
04 Aug 2023
Feature Noise Boosts DNN Generalization under Label Noise
Lu Zeng
Xuan Chen
Xiaoshuang Shi
Jikang Cheng
MLT
NoLa
56
2
0
03 Aug 2023
Arithmetic with Language Models: from Memorization to Computation
Davide Maltoni
Matteo Ferrara
KELM
LRM
84
7
0
02 Aug 2023
Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy
Shibo Jie
Haoqing Wang
Zhiwei Deng
76
34
0
31 Jul 2023
Lookbehind-SAM: k steps back, 1 step forward
Gonçalo Mordido
Pranshu Malviya
A. Baratin
Sarath Chandar
AAML
90
1
0
31 Jul 2023
GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning
Soumyadeep Roy
Jonas Wallat
Sowmya S. Sundaram
Wolfgang Nejdl
Niloy Ganguly
59
3
0
29 Jul 2023
Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation
Jianan Fan
Dongnan Liu
Hang Chang
Heng-Chiao Huang
Mei Chen
Weidong (Tom) Cai
OOD
91
9
0
27 Jul 2023
Modify Training Directions in Function Space to Reduce Generalization Error
Yi Yu
Wenlian Lu
Boyu Chen
71
0
0
25 Jul 2023
The instabilities of large learning rate training: a loss landscape view
Lawrence Wang
Stephen J. Roberts
17
2
0
22 Jul 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
98
29
0
20 Jul 2023
Flatness-Aware Minimization for Domain Generalization
Xingxuan Zhang
Renzhe Xu
Han Yu
Yancheng Dong
Pengfei Tian
Peng Cu
83
22
0
20 Jul 2023
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Pranshu Malviya
Gonçalo Mordido
A. Baratin
Reza Babanezhad Harikandeh
Jerry Huang
Simon Lacoste-Julien
Razvan Pascanu
Sarath Chandar
ODL
36
1
0
18 Jul 2023
Sharpness-Aware Graph Collaborative Filtering
Huiyuan Chen
Chin-Chia Michael Yeh
Yujie Fan
Yan Zheng
Junpeng Wang
Vivian Lai
Mahashweta Das
Hao Yang
75
5
0
18 Jul 2023
Snapshot Spectral Clustering -- a costless approach to deep clustering ensembles generation
Adam Piróg
Halina Kwasnicka
44
1
0
17 Jul 2023
DOT: A Distillation-Oriented Trainer
Borui Zhao
Quan Cui
Renjie Song
Jiajun Liang
55
7
0
17 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
84
4
0
16 Jul 2023
The Interpolating Information Criterion for Overparameterized Models
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
72
9
0
15 Jul 2023
Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems
Gabriel Mancino-Ball
Yangyang Xu
116
8
0
14 Jul 2023
Memorization Through the Lens of Curvature of Loss Function Around Samples
Isha Garg
Deepak Ravikumar
Kaushik Roy
TDI
65
13
0
11 Jul 2023
Implicit regularisation in stochastic gradient descent: from single-objective to two-player games
Mihaela Rosca
M. Deisenroth
58
2
0
11 Jul 2023
On the curvature of the loss landscape
Alison Pouplin
Hrittik Roy
Sidak Pal Singh
Georgios Arvanitidis
54
1
0
10 Jul 2023
Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
C. Hartmann
Lorenz Richter
AAML
52
2
0
05 Jul 2023
FAM: Relative Flatness Aware Minimization
Linara Adilova
Amr Abourayya
Jianning Li
Amin Dada
Henning Petzka
Jan Egger
Jens Kleesiek
Michael Kamp
ODL
47
1
0
05 Jul 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Yang Luo
Xiaozhe Ren
Zangwei Zheng
Zhuo Jiang
Xin Jiang
Yang You
ODL
84
22
0
05 Jul 2023
Sparsity-aware generalization theory for deep neural networks
Ramchandran Muthukumar
Jeremias Sulam
MLT
42
7
0
01 Jul 2023
Towards Brain Inspired Design for Addressing the Shortcomings of ANNs
F. Sarfraz
Elahe Arani
Bahram Zonooz
31
1
0
30 Jun 2023
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Peng Mi
Li Shen
Tianhe Ren
Yiyi Zhou
Tianshuo Xu
Xiaoshuai Sun
Tongliang Liu
Rongrong Ji
Dacheng Tao
AAML
63
2
0
30 Jun 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
Mor Shpigel Nacson
Rotem Mulayoff
Greg Ongie
T. Michaeli
Daniel Soudry
84
13
0
30 Jun 2023
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses
Jeongmin Brian Park
Vikram Sharma Mailthody
Zaid Qureshi
Wen-mei W. Hwu
GNN
75
13
0
28 Jun 2023
Black holes and the loss landscape in machine learning
P. Kumar
Taniya Mandal
Swapnamay Mondal
64
2
0
26 Jun 2023
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
Anna Bair
Hongxu Yin
Maying Shen
Pavlo Molchanov
J. Álvarez
99
12
0
25 Jun 2023
BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs
Loc Hoang
Rita Brugarolas Brufau
Ke Ding
Bo Wu
GNN
61
2
0
23 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
101
38
0
23 Jun 2023
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Pascal Junior Tikeng Notsawo
Hattie Zhou
Mohammad Pezeshki
Irina Rish
G. Dumas
100
24
0
23 Jun 2023
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Khashayar Gatmiry
Zhiyuan Li
Ching-Yao Chuang
Sashank J. Reddi
Tengyu Ma
Stefanie Jegelka
ODL
77
12
0
22 Jun 2023
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning
Hojoon Lee
Hanseul Cho
Hyunseung Kim
Daehoon Gwak
Joonkee Kim
Jaegul Choo
Se-Young Yun
Chulhee Yun
OffRL
157
30
0
19 Jun 2023
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
110
59
0
16 Jun 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
108
15
0
16 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
79
4
0
14 Jun 2023
Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
Shahar Stein Ioushua
Inbar Hasidim
O. Shayevitz
M. Feder
54
0
0
14 Jun 2023
Exact Mean Square Linear Stability Analysis for SGD
Rotem Mulayoff
T. Michaeli
MLT
59
2
0
13 Jun 2023
Unveiling the Hessian's Connection to the Decision Boundary
Mahalakshmi Sabanayagam
Freya Behrens
Urte Adomaityte
Anna Dawid
54
5
0
12 Jun 2023
Gradient Ascent Post-training Enhances Language Model Generalization
Dongkeun Yoon
Joel Jang
Sungdong Kim
Minjoon Seo
VLM
AI4CE
77
3
0
12 Jun 2023
An information-Theoretic Approach to Semi-supervised Transfer Learning
Daniel Jakubovitz
David Uliel
Miguel R. D. Rodrigues
Raja Giryes
56
1
0
11 Jun 2023
Differentially Private Sharpness-Aware Training
Jinseong Park
Hoki Kim
Yujin Choi
Jaewook Lee
83
8
0
09 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
76
3
0
08 Jun 2023
Boosting Adversarial Transferability by Achieving Flat Local Maxima
Zhijin Ge
Hongying Liu
Xiaosen Wang
Fanhua Shang
Yuanyuan Liu
AAML
84
48
0
08 Jun 2023
Previous
1
2
3
...
7
8
9
...
30
31
32
Next