Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual Learning
Xingyu Li
Bo Tang
VLM
CLL
64
0
0
15 May 2024
Why is SAM Robust to Label Noise?
Christina Baek
Zico Kolter
Aditi Raghunathan
NoLa
AAML
94
11
0
06 May 2024
Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Zhiwei Wang
Lulu Zhang
Zhongwang Zhang
Z. Xu
58
0
0
06 May 2024
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
104
4
0
02 May 2024
PackVFL: Efficient HE Packing for Vertical Federated Learning
Liu Yang
Shuowei Cai
Di Chai
Junxue Zhang
Han Tian
Yilun Jin
Kun Guo
Kai Chen
Qiang Yang
FedML
58
1
0
01 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
199
2
0
30 Apr 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
78
0
0
25 Apr 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
89
0
0
24 Apr 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN
3DPC
70
0
0
19 Apr 2024
Singular-limit analysis of gradient descent with noise injection
Anna Shalova
André Schlichting
M. Peletier
59
2
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
89
1
0
17 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Rui Li
Martin Trapp
Talal Alrawajfeh
Arno Solin
111
0
0
11 Apr 2024
Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity
Zhanran Lin
Puheng Li
Lei Wu
255
9
0
09 Apr 2024
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
77
0
0
05 Apr 2024
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Mohammed Ghaith Altarabichi
Sławomir Nowaczyk
Sepideh Pashami
Peyman Sheikholharam Mashhadi
Julia Handl
40
11
0
05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
99
6
0
04 Apr 2024
Make Continual Learning Stronger via C-Flat
Ang Bian
Wei Li
Hangjie Yuan
Chengrong Yu
Zixiang Zhao
Mang Wang
Aojun Lu
Tao Feng
77
12
0
01 Apr 2024
Revisiting Random Weight Perturbation for Efficiently Improving Generalization
Tao Li
Qinghua Tao
Weihao Yan
Zehao Lei
Yingwen Wu
Kun Fang
Mingzhen He
Xiaolin Huang
AAML
101
6
0
30 Mar 2024
Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
Tuan Nguyen
C. Fredouille
A. Ghio
M. Balaguer
Virginie Woisard
38
1
0
29 Mar 2024
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao
Yong Lin
Difan Zou
Tong Zhang
OODD
OOD
88
6
0
26 Mar 2024
Self-Supervised Multi-Frame Neural Scene Flow
Dongrui Liu
Daqi Liu
Xueqian Li
Sihao Lin
Hongwei Xie
Bing Wang
Xiaojun Chang
Lei Chu
135
3
0
24 Mar 2024
SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images
Yifei Wang
Chuhong Zhu
85
0
0
24 Mar 2024
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning
Tausifa Jan Saleem
Ramanjit Ahuja
Surendra Prasad
Brejesh Lall
84
0
0
22 Mar 2024
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Anh-Vu Bui
Vy Vo
Tung Pham
Dinh Q. Phung
Trung Le
FedML
UQCV
70
1
0
19 Mar 2024
Friendly Sharpness-Aware Minimization
Tao Li
Pan Zhou
Zhengbao He
Xinwen Cheng
Xiaolin Huang
AAML
80
17
0
19 Mar 2024
Semiparametric Token-Sequence Co-Supervision
Hyunji Lee
Doyoung Kim
Jihoon Jun
Se June Joo
Joel Jang
Kyoung-Woon On
Minjoon Seo
114
1
0
14 Mar 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
106
1
0
12 Mar 2024
Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning
Junseok Park
Yoonsung Kim
Hee Bin Yoo
Min Whoo Lee
Kibeom Kim
Won-Seok Choi
Minsu Lee
Byoung-Tak Zhang
OffRL
68
1
0
11 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Vatsal Sharan
89
2
0
11 Mar 2024
CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global Warming
Wei Chen
Yun Li
Yuan Tian
AI4CE
52
0
0
09 Mar 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
64
0
0
08 Mar 2024
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Tolga Dimlioglu
A. Choromańska
73
4
0
07 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
73
8
0
05 Mar 2024
Level Set Teleportation: An Optimization Perspective
Aaron Mishkin
A. Bietti
Robert Mansel Gower
98
1
0
05 Mar 2024
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Jiashuo Liu
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
101
9
0
04 Mar 2024
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
107
8
0
01 Mar 2024
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms
Toki Tahmid Inan
Mingrui Liu
Amarda Shehu
54
0
0
01 Mar 2024
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou
Yicong Liu
Yiman Hu
Yuhua Li
Ruixuan Li
86
7
0
01 Mar 2024
Fine-tuning with Very Large Dropout
Jianyu Zhang
Léon Bottou
124
2
0
01 Mar 2024
Batch size invariant Adam
Xi Wang
Laurence Aitchison
87
2
0
29 Feb 2024
Gradient Alignment for Cross-Domain Face Anti-Spoofing
B. Le
Simon S. Woo
CVBM
83
20
0
29 Feb 2024
Pre-training Differentially Private Models with Limited Public Data
Zhiqi Bu
Xinwei Zhang
Mingyi Hong
Sheng Zha
George Karypis
114
4
0
28 Feb 2024
Unveiling Privacy, Memorization, and Input Curvature Links
Deepak Ravikumar
Efstathia Soufleri
Abolfazl Hashemi
Kaushik Roy
97
6
0
28 Feb 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
59
0
0
28 Feb 2024
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
46
0
0
26 Feb 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
70
1
0
24 Feb 2024
Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization
Zirui Zhu
Yong Liu
Zangwei Zheng
Huifeng Guo
Yang You
45
0
0
23 Feb 2024
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang
Hangzhou He
Jingyu Zhu
Huanran Chen
Yifei Wang
Zeming Wei
AAML
120
15
0
23 Feb 2024
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
Dhananjay Saikumar
Blesson Varghese
65
1
0
21 Feb 2024
Investigating the Histogram Loss in Regression
Ehsan Imani
Kai Luedemann
Sam Scholnick-Hughes
Esraa Elelimy
Martha White
UQCV
55
6
0
20 Feb 2024
Previous
1
2
3
4
5
6
...
30
31
32
Next