Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,585 papers shown
Title
Self-Supervised Multi-Frame Neural Scene Flow
Dongrui Liu
Daqi Liu
Xueqian Li
Sihao Lin
Hongwei Xie
Bing Wang
Xiaojun Chang
Lei Chu
175
3
0
24 Mar 2024
SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images
Yifei Wang
Chuhong Zhu
108
0
0
24 Mar 2024
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning
Tausifa Jan Saleem
Ramanjit Ahuja
Surendra Prasad
Brejesh Lall
147
0
0
22 Mar 2024
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Anh-Vu Bui
Vy Vo
Tung Pham
Dinh Q. Phung
Trung Le
FedML
UQCV
90
1
0
19 Mar 2024
Friendly Sharpness-Aware Minimization
Tao Li
Pan Zhou
Zhengbao He
Xinwen Cheng
Xiaolin Huang
AAML
124
23
0
19 Mar 2024
Semiparametric Token-Sequence Co-Supervision
Hyunji Lee
Doyoung Kim
Jihoon Jun
Se June Joo
Joel Jang
Kyoung-Woon On
Minjoon Seo
129
1
0
14 Mar 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
134
3
0
12 Mar 2024
Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning
Junseok Park
Yoonsung Kim
Hee Bin Yoo
Min Whoo Lee
Kibeom Kim
Won-Seok Choi
Minsu Lee
Byoung-Tak Zhang
OffRL
103
1
0
11 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Vatsal Sharan
176
4
0
11 Mar 2024
CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global Warming
Wei Chen
Yun Li
Yuan Tian
AI4CE
91
0
0
09 Mar 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
96
0
0
08 Mar 2024
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Tolga Dimlioglu
A. Choromańska
100
6
0
07 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
111
9
0
05 Mar 2024
Level Set Teleportation: An Optimization Perspective
Aaron Mishkin
A. Bietti
Robert Mansel Gower
141
1
0
05 Mar 2024
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Tianyu Wang
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
141
11
0
04 Mar 2024
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
181
10
0
01 Mar 2024
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms
Toki Tahmid Inan
Mingrui Liu
Amarda Shehu
109
0
0
01 Mar 2024
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou
Yicong Liu
Yiman Hu
Yuhua Li
Ruixuan Li
100
12
0
01 Mar 2024
Fine-tuning with Very Large Dropout
Jianyu Zhang
Léon Bottou
196
4
0
01 Mar 2024
Batch size invariant Adam
Xi Wang
Laurence Aitchison
107
3
0
29 Feb 2024
Gradient Alignment for Cross-Domain Face Anti-Spoofing
B. Le
Simon S. Woo
CVBM
125
26
0
29 Feb 2024
Pre-training Differentially Private Models with Limited Public Data
Zhiqi Bu
Xinwei Zhang
Mingyi Hong
Sheng Zha
George Karypis
169
4
0
28 Feb 2024
Unveiling Privacy, Memorization, and Input Curvature Links
Deepak Ravikumar
Efstathia Soufleri
Abolfazl Hashemi
Kaushik Roy
142
8
0
28 Feb 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
90
0
0
28 Feb 2024
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
77
0
0
26 Feb 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
93
2
0
24 Feb 2024
Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization
Zirui Zhu
Yong Liu
Zangwei Zheng
Huifeng Guo
Yang You
73
0
0
23 Feb 2024
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang
Hangzhou He
Jingyu Zhu
Huanran Chen
Yifei Wang
Zeming Wei
AAML
160
17
0
23 Feb 2024
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
Dhananjay Saikumar
Blesson Varghese
76
1
0
21 Feb 2024
Investigating the Histogram Loss in Regression
Ehsan Imani
Kai Luedemann
Sam Scholnick-Hughes
Esraa Elelimy
Martha White
UQCV
85
7
0
20 Feb 2024
Scaling physics-informed hard constraints with mixture-of-experts
N. Chalapathi
Yiheng Du
Aditi Krishnapriyan
AI4CE
145
18
0
20 Feb 2024
OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations
Yao Shu
Jiongfeng Fang
Y. He
Fei Richard Yu
88
0
0
18 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
126
8
0
17 Feb 2024
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert
Ambroise Odonnat
Vasilii Feofanov
Aladin Virmaux
Giuseppe Paolo
Themis Palpanas
I. Redko
AI4TS
179
37
0
15 Feb 2024
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Tom Sander
Maxime Sylvestre
Alain Durmus
84
1
0
13 Feb 2024
Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors
D. Sahabandu
Xiaojun Xu
Arezoo Rajabi
Luyao Niu
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
AAML
88
1
0
12 Feb 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
106
2
0
07 Feb 2024
Strong convexity-guided hyper-parameter optimization for flatter losses
Rahul Yedida
Snehanshu Saha
132
0
0
07 Feb 2024
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Omead Brandon Pooladzandi
Xi-Lin Li
113
8
0
07 Feb 2024
Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation
Ossi Raisa
Hibiki Ito
Antti Honkela
104
6
0
06 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
Artem Vysogorets
Anna Dawid
Julia Kempe
99
1
0
05 Feb 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Naoki Sato
Hideaki Iiduka
ODL
102
1
0
04 Feb 2024
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
Baoyuan Wu
Hongrui Chen
Ruotong Wang
Zihao Zhu
Shaokui Wei
Danni Yuan
Mingli Zhu
Ke Xu
Li Liu
Chaoxiao Shen
AAML
ELM
151
13
0
26 Jan 2024
Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN
Minsoo Kang
Minkoo Kang
Suhyun Kim
57
7
0
24 Jan 2024
DALex: Lexicase-like Selection via Diverse Aggregation
Andrew Ni
Lijie Ding
Lee Spector
155
8
0
23 Jan 2024
A Precise Characterization of SGD Stability Using Loss Surface Geometry
Gregory Dexter
Borja Ocejo
S. Keerthi
Aman Gupta
Ayan Acharya
Rajiv Khanna
MLT
117
0
0
22 Jan 2024
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Leonardo Castro-Gonzalez
Yi-Ling Chung
Hannak Rose Kirk
John Francis
Angus R. Williams
Pica Johansson
Jonathan Bright
122
2
0
22 Jan 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
226
7
0
22 Jan 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
98
7
0
21 Jan 2024
The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness
Yifan Hao
Tong Zhang
AAML
203
5
0
19 Jan 2024
Previous
1
2
3
...
5
6
7
...
30
31
32
Next