ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
A Neural Network Based Choice Model for Assortment Optimization
A Neural Network Based Choice Model for Assortment Optimization
Hanrui Wang
Zhongze Cai
Xiaocheng Li
Kalyan Talluri
42
2
0
10 Aug 2023
G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
Xingyu Li
Bo Tang
AAML
46
0
0
07 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity,
  Sharpness, and Feature Learning
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
61
3
0
06 Aug 2023
Model Provenance via Model DNA
Model Provenance via Model DNA
Xin Mu
Yu Wang
Yehong Zhang
Jiaqi Zhang
Haibo Wang
Yang Xiang
Yue Yu
SyDa
58
0
0
04 Aug 2023
Feature Noise Boosts DNN Generalization under Label Noise
Feature Noise Boosts DNN Generalization under Label Noise
Lu Zeng
Xuan Chen
Xiaoshuang Shi
Jikang Cheng
MLTNoLa
56
2
0
03 Aug 2023
Arithmetic with Language Models: from Memorization to Computation
Arithmetic with Language Models: from Memorization to Computation
Davide Maltoni
Matteo Ferrara
KELMLRM
84
7
0
02 Aug 2023
Revisiting the Parameter Efficiency of Adapters from the Perspective of
  Precision Redundancy
Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy
Shibo Jie
Haoqing Wang
Zhiwei Deng
76
34
0
31 Jul 2023
Lookbehind-SAM: k steps back, 1 step forward
Lookbehind-SAM: k steps back, 1 step forward
Gonçalo Mordido
Pranshu Malviya
A. Baratin
Sarath Chandar
AAML
90
1
0
31 Jul 2023
GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning
GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning
Soumyadeep Roy
Jonas Wallat
Sowmya S. Sundaram
Wolfgang Nejdl
Niloy Ganguly
59
3
0
29 Jul 2023
Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via
  Optimization Trajectory Distillation
Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation
Jianan Fan
Dongnan Liu
Hang Chang
Heng-Chiao Huang
Mei Chen
Weidong (Tom) Cai
OOD
91
9
0
27 Jul 2023
Modify Training Directions in Function Space to Reduce Generalization
  Error
Modify Training Directions in Function Space to Reduce Generalization Error
Yi Yu
Wenlian Lu
Boyu Chen
71
0
0
25 Jul 2023
The instabilities of large learning rate training: a loss landscape view
The instabilities of large learning rate training: a loss landscape view
Lawrence Wang
Stephen J. Roberts
17
2
0
22 Jul 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To
  Achieve Better Generalization
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
98
29
0
20 Jul 2023
Flatness-Aware Minimization for Domain Generalization
Flatness-Aware Minimization for Domain Generalization
Xingxuan Zhang
Renzhe Xu
Han Yu
Yancheng Dong
Pengfei Tian
Peng Cu
83
22
0
20 Jul 2023
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Pranshu Malviya
Gonçalo Mordido
A. Baratin
Reza Babanezhad Harikandeh
Jerry Huang
Simon Lacoste-Julien
Razvan Pascanu
Sarath Chandar
ODL
36
1
0
18 Jul 2023
Sharpness-Aware Graph Collaborative Filtering
Sharpness-Aware Graph Collaborative Filtering
Huiyuan Chen
Chin-Chia Michael Yeh
Yujie Fan
Yan Zheng
Junpeng Wang
Vivian Lai
Mahashweta Das
Hao Yang
75
5
0
18 Jul 2023
Snapshot Spectral Clustering -- a costless approach to deep clustering
  ensembles generation
Snapshot Spectral Clustering -- a costless approach to deep clustering ensembles generation
Adam Piróg
Halina Kwasnicka
44
1
0
17 Jul 2023
DOT: A Distillation-Oriented Trainer
DOT: A Distillation-Oriented Trainer
Borui Zhao
Quan Cui
Renjie Song
Jiajun Liang
55
7
0
17 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
84
4
0
16 Jul 2023
The Interpolating Information Criterion for Overparameterized Models
The Interpolating Information Criterion for Overparameterized Models
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
72
9
0
15 Jul 2023
Variance-reduced accelerated methods for decentralized stochastic
  double-regularized nonconvex strongly-concave minimax problems
Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems
Gabriel Mancino-Ball
Yangyang Xu
116
8
0
14 Jul 2023
Memorization Through the Lens of Curvature of Loss Function Around
  Samples
Memorization Through the Lens of Curvature of Loss Function Around Samples
Isha Garg
Deepak Ravikumar
Kaushik Roy
TDI
65
13
0
11 Jul 2023
Implicit regularisation in stochastic gradient descent: from
  single-objective to two-player games
Implicit regularisation in stochastic gradient descent: from single-objective to two-player games
Mihaela Rosca
M. Deisenroth
58
2
0
11 Jul 2023
On the curvature of the loss landscape
On the curvature of the loss landscape
Alison Pouplin
Hrittik Roy
Sidak Pal Singh
Georgios Arvanitidis
54
1
0
10 Jul 2023
Transgressing the boundaries: towards a rigorous understanding of deep
  learning and its (non-)robustness
Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
C. Hartmann
Lorenz Richter
AAML
52
2
0
05 Jul 2023
FAM: Relative Flatness Aware Minimization
FAM: Relative Flatness Aware Minimization
Linara Adilova
Amr Abourayya
Jianning Li
Amin Dada
Henning Petzka
Jan Egger
Jens Kleesiek
Michael Kamp
ODL
47
1
0
05 Jul 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Yang Luo
Xiaozhe Ren
Zangwei Zheng
Zhuo Jiang
Xin Jiang
Yang You
ODL
84
22
0
05 Jul 2023
Sparsity-aware generalization theory for deep neural networks
Sparsity-aware generalization theory for deep neural networks
Ramchandran Muthukumar
Jeremias Sulam
MLT
42
7
0
01 Jul 2023
Towards Brain Inspired Design for Addressing the Shortcomings of ANNs
Towards Brain Inspired Design for Addressing the Shortcomings of ANNs
F. Sarfraz
Elahe Arani
Bahram Zonooz
31
1
0
30 Jun 2023
Systematic Investigation of Sparse Perturbed Sharpness-Aware
  Minimization Optimizer
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Peng Mi
Li Shen
Tianhe Ren
Yiyi Zhou
Tianshuo Xu
Xiaoshuai Sun
Tongliang Liu
Rongrong Ji
Dacheng Tao
AAML
63
2
0
30 Jun 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
  Networks
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
Mor Shpigel Nacson
Rotem Mulayoff
Greg Ongie
T. Michaeli
Daniel Soudry
84
13
0
30 Jun 2023
Accelerating Sampling and Aggregation Operations in GNN Frameworks with
  GPU Initiated Direct Storage Accesses
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses
Jeongmin Brian Park
Vikram Sharma Mailthody
Zaid Qureshi
Wen-mei W. Hwu
GNN
75
13
0
28 Jun 2023
Black holes and the loss landscape in machine learning
Black holes and the loss landscape in machine learning
P. Kumar
Taniya Mandal
Swapnamay Mondal
64
2
0
26 Jun 2023
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
Anna Bair
Hongxu Yin
Maying Shen
Pavlo Molchanov
J. Álvarez
99
12
0
25 Jun 2023
BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large
  Graphs
BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs
Loc Hoang
Rita Brugarolas Brufau
Ke Ding
Bo Wu
GNN
61
2
0
23 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Scaling MLPs: A Tale of Inductive Bias
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
101
38
0
23 Jun 2023
Predicting Grokking Long Before it Happens: A look into the loss
  landscape of models which grok
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Pascal Junior Tikeng Notsawo
Hattie Zhou
Mohammad Pezeshki
Irina Rish
G. Dumas
100
24
0
23 Jun 2023
The Inductive Bias of Flatness Regularization for Deep Matrix
  Factorization
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Khashayar Gatmiry
Zhiyuan Li
Ching-Yao Chuang
Sashank J. Reddi
Tengyu Ma
Stefanie Jegelka
ODL
77
12
0
22 Jun 2023
PLASTIC: Improving Input and Label Plasticity for Sample Efficient
  Reinforcement Learning
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning
Hojoon Lee
Hanseul Cho
Hyunseung Kim
Daehoon Gwak
Joonkee Kim
Jaegul Choo
Se-Young Yun
Chulhee Yun
OffRL
157
30
0
19 Jun 2023
ZeRO++: Extremely Efficient Collective Communication for Giant Model
  Training
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
110
59
0
16 Jun 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to
  Optima
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
108
15
0
16 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
79
4
0
14 Jun 2023
Batches Stabilize the Minimum Norm Risk in High Dimensional
  Overparameterized Linear Regression
Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
Shahar Stein Ioushua
Inbar Hasidim
O. Shayevitz
M. Feder
54
0
0
14 Jun 2023
Exact Mean Square Linear Stability Analysis for SGD
Exact Mean Square Linear Stability Analysis for SGD
Rotem Mulayoff
T. Michaeli
MLT
59
2
0
13 Jun 2023
Unveiling the Hessian's Connection to the Decision Boundary
Unveiling the Hessian's Connection to the Decision Boundary
Mahalakshmi Sabanayagam
Freya Behrens
Urte Adomaityte
Anna Dawid
54
5
0
12 Jun 2023
Gradient Ascent Post-training Enhances Language Model Generalization
Gradient Ascent Post-training Enhances Language Model Generalization
Dongkeun Yoon
Joel Jang
Sungdong Kim
Minjoon Seo
VLMAI4CE
77
3
0
12 Jun 2023
An information-Theoretic Approach to Semi-supervised Transfer Learning
An information-Theoretic Approach to Semi-supervised Transfer Learning
Daniel Jakubovitz
David Uliel
Miguel R. D. Rodrigues
Raja Giryes
56
1
0
11 Jun 2023
Differentially Private Sharpness-Aware Training
Differentially Private Sharpness-Aware Training
Jinseong Park
Hoki Kim
Yujin Choi
Jaewook Lee
83
8
0
09 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent:
  Implications for Weight Variances
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
76
3
0
08 Jun 2023
Boosting Adversarial Transferability by Achieving Flat Local Maxima
Boosting Adversarial Transferability by Achieving Flat Local Maxima
Zhijin Ge
Hongying Liu
Xiaosen Wang
Fanhua Shang
Yuanyuan Liu
AAML
84
48
0
08 Jun 2023
Previous
123...789...303132
Next