Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
107
6
0
06 Apr 2023
Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability
Martin Gubri
Maxime Cordy
Yves Le Traon
AAML
82
3
1
05 Apr 2023
Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
Haoyi Xiong
Xuhong Li
Bo Yu
Zhanxing Zhu
Dongrui Wu
Dejing Dou
NoLa
43
0
0
01 Apr 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao Song
Dinesh Manocha
88
39
0
28 Mar 2023
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach
Adel Javanmard
Vahab Mirrokni
Pratik Worah
62
7
0
27 Mar 2023
Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
Tianli Zhang
Mengqi Xue
Jiangtao Zhang
Haofei Zhang
Yu Wang
Lechao Cheng
Mingli Song
Mingli Song
61
6
0
26 Mar 2023
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
63
1
0
24 Mar 2023
Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
Zhuo Huang
Miaoxi Zhu
Xiaobo Xia
Li Shen
Jun Yu
Chen Gong
Bo Han
Bo Du
Tongliang Liu
78
36
0
23 Mar 2023
Decentralized Adversarial Training over Graphs
Ying Cao
Elsa Rizk
Stefan Vlaski
Ali H. Sayed
AAML
161
1
0
23 Mar 2023
Physics-informed PointNet: On how many irregular geometries can it solve an inverse problem simultaneously? Application to linear elasticity
Ali Kashefi
Leonidas Guibas
T. Mukerji
PINN
3DPC
AI4CE
93
10
0
22 Mar 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
114
5
0
22 Mar 2023
Randomized Adversarial Training via Taylor Expansion
Gao Jin
Xinping Yi
Dengyu Wu
Ronghui Mu
Xiaowei Huang
AAML
111
37
0
19 Mar 2023
Experimenting with Normalization Layers in Federated Learning on non-IID scenarios
Bruno Casella
Roberto Esposito
A. Sciarappa
C. Cavazzoni
Marco Aldinucci
FedML
48
16
0
19 Mar 2023
Sharpness-Aware Gradient Matching for Domain Generalization
Pengfei Wang
Zhaoxiang Zhang
Zhen Lei
Lei Zhang
73
95
0
18 Mar 2023
Hierarchical Prior Mining for Non-local Multi-View Stereo
Chunlin Ren
Qingshan Xu
Shikun Zhang
Jiaqi Yang
3DV
79
7
0
17 Mar 2023
Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift
Francisco Pérez-Galarce
K. Pichara
P. Huijse
M. Catelán
D. Méry
45
0
0
12 Mar 2023
Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap
Weiyang Liu
L. Yu
Adrian Weller
Bernhard Schölkopf
90
18
0
11 Mar 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
56
11
0
05 Mar 2023
What Is Missing in IRM Training and Evaluation? Challenges and Solutions
Yihua Zhang
Pranay Sharma
Parikshit Ram
Min-Fong Hong
Kush R. Varshney
Sijia Liu
84
13
0
04 Mar 2023
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
Xingxuan Zhang
Renzhe Xu
Han Yu
Hao Zou
Peng Cui
77
41
0
03 Mar 2023
Over-training with Mixup May Hurt Generalization
Zixuan Liu
Ziqiao Wang
Hongyu Guo
Yongyi Mao
NoLa
90
11
0
02 Mar 2023
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy
Natalia Ponomareva
Hussein Hazimeh
Alexey Kurakin
Zheng Xu
Carson E. Denison
H. B. McMahan
Sergei Vassilvitskii
Steve Chien
Abhradeep Thakurta
156
183
0
01 Mar 2023
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks
Hao Sun
Li Shen
Qihuang Zhong
Liang Ding
Shi-Yong Chen
Jingwei Sun
Jing Li
Guangzhong Sun
Dacheng Tao
98
34
0
01 Mar 2023
ASP: Learn a Universal Neural Solver!
Chenguang Wang
Zhouliang Yu
Stephen Marcus McAleer
Tianshu Yu
Yao-Chun Yang
AAML
114
26
0
01 Mar 2023
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
Samyak Jain
Sravanti Addepalli
P. Sahu
Priyam Dey
R. Venkatesh Babu
MoMe
OOD
118
20
0
28 Feb 2023
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
Dayal Singh Kalra
M. Barkeshli
131
9
0
23 Feb 2023
On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees
Kayhan Behdin
Rahul Mazumder
118
6
0
23 Feb 2023
Learning to Generalize Provably in Learning to Optimize
Junjie Yang
Tianlong Chen
Mingkang Zhu
Fengxiang He
Dacheng Tao
Yitao Liang
Zhangyang Wang
72
7
0
22 Feb 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
59
7
0
19 Feb 2023
Stationary Point Losses for Robust Model
Weiwei Gao
Dazhi Zhang
Yao Li
Zhichang Guo
Ovanes Petrosian
OOD
100
0
0
19 Feb 2023
Why is parameter averaging beneficial in SGD? An objective smoothing perspective
Atsushi Nitanda
Ryuhei Kikuchi
Shugo Maeda
Denny Wu
FedML
49
0
0
18 Feb 2023
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning
Caoyun Fan
Wenqing Chen
Jidong Tian
Yitian Li
Hao He
Yaohui Jin
40
2
0
18 Feb 2023
Invertible Neural Skinning
Yash Kant
Aliaksandr Siarohin
R. A. Guler
Menglei Chai
Jian Ren
Sergey Tulyakov
Igor Gilitschenski
3DH
71
2
0
18 Feb 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
Mathieu Even
Scott Pesme
Suriya Gunasekar
Nicolas Flammarion
83
18
0
17 Feb 2023
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
67
21
0
17 Feb 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
Minghao Li
Ran Ben-Basat
S. Vargaftik
Chon-In Lao
Ke Xu
Michael Mitzenmacher
Minlan Yu Harvard University
94
19
0
16 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
Agustinus Kristiadi
Felix Dangel
Philipp Hennig
73
14
0
14 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
131
63
0
14 Feb 2023
Revisiting Weighted Aggregation in Federated Learning with Neural Networks
Zexi Li
Tao R. Lin
Xinyi Shang
Chao-Xiang Wu
FedML
102
65
0
14 Feb 2023
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
174
381
0
13 Feb 2023
Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization
Hamidreza Almasi
Harshit Mishra
Balajee Vamanan
Sathya Ravi
FedML
51
0
0
12 Feb 2023
Data efficiency and extrapolation trends in neural network interatomic potentials
Joshua A Vita
Daniel Schwalbe-Koda
73
17
0
12 Feb 2023
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
Siddharth Singh
A. Bhatele
69
9
0
10 Feb 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan
101
13
0
07 Feb 2023
Generalization Bounds with Data-dependent Fractal Dimensions
Benjamin Dupuis
George Deligiannidis
Umut cSimcsekli
AI4CE
67
12
0
06 Feb 2023
Flat Seeking Bayesian Neural Networks
Van-Anh Nguyen
L. Vuong
Hoang Phan
Thanh-Toan Do
Dinh Q. Phung
Trung Le
BDL
100
10
0
06 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
75
10
0
03 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
67
6
0
03 Feb 2023
Anderson Acceleration For Bioinformatics-Based Machine Learning
Sarwan Ali
Prakash Chourasia
Murray Patterson
60
2
0
01 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
Matthieu Wyart
64
6
0
31 Jan 2023
Previous
1
2
3
...
9
10
11
...
30
31
32
Next