Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,653 papers shown
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
595
2
0
30 Apr 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
222
0
0
25 Apr 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
216
0
0
24 Apr 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN
3DPC
247
1
0
19 Apr 2024
Singular-limit analysis of gradient descent with noise injection
Anna Shalova
André Schlichting
M. Peletier
222
5
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
278
5
0
17 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Rui Li
Martin Trapp
Talal Alrawajfeh
Arno Solin
444
1
0
11 Apr 2024
Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity
Zhanran Lin
Puheng Li
Lei Wu
473
9
0
09 Apr 2024
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
326
0
0
05 Apr 2024
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Mohammed Ghaith Altarabichi
Sławomir Nowaczyk
Sepideh Pashami
Peyman Sheikholharam Mashhadi
Julia Handl
169
25
0
05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2024
Haiyun He
Christina Lee Yu
443
10
0
04 Apr 2024
Make Continual Learning Stronger via C-Flat
Ang Bian
Wei Li
Hangjie Yuan
Chengrong Yu
Zixiang Zhao
Mang Wang
Aojun Lu
Tao Feng
271
25
0
01 Apr 2024
Revisiting Random Weight Perturbation for Efficiently Improving Generalization
Tao Li
Qinghua Tao
Weihao Yan
Zehao Lei
Yingwen Wu
Kun Fang
Mingzhen He
Xiaolin Huang
AAML
388
11
0
30 Mar 2024
Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
Tuan Nguyen
C. Fredouille
A. Ghio
M. Balaguer
Virginie Woisard
156
3
0
29 Mar 2024
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang
Sangdoo Yun
Dongyoon Han
OODD
MoMe
416
73
0
28 Mar 2024
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao
Yong Lin
Difan Zou
Tong Zhang
OODD
OOD
246
6
0
26 Mar 2024
Self-Supervised Multi-Frame Neural Scene Flow
Dongrui Liu
Daqi Liu
Xueqian Li
Sihao Lin
Hongwei Xie
Bing Wang
Xiaojun Chang
Lei Chu
407
3
0
24 Mar 2024
SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images
Yifei Wang
Chuhong Zhu
275
0
0
24 Mar 2024
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning
Tausifa Jan Saleem
Ramanjit Ahuja
Surendra Prasad
Brejesh Lall
320
0
0
22 Mar 2024
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Anh-Vu Bui
Vy Vo
Tung Pham
Dinh Q. Phung
Trung Le
FedML
UQCV
264
1
0
19 Mar 2024
Friendly Sharpness-Aware Minimization
Computer Vision and Pattern Recognition (CVPR), 2024
Tao Li
Pan Zhou
Zhengbao He
Xinwen Cheng
Xiaolin Huang
AAML
274
36
0
19 Mar 2024
Semiparametric Token-Sequence Co-Supervision
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hyunji Lee
Doyoung Kim
Jihoon Jun
Se June Joo
Joel Jang
Kyoung-Woon On
Minjoon Seo
276
0
0
14 Mar 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
365
4
0
12 Mar 2024
Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Junseok Park
Yoonsung Kim
Hee Bin Yoo
Min Whoo Lee
Kibeom Kim
Won-Seok Choi
Minsu Lee
Byoung-Tak Zhang
OffRL
224
2
0
11 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
International Conference on Learning Representations (ICLR), 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Willie Neiswanger
470
7
0
11 Mar 2024
CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global Warming
Journal of Robotics and Automation Research (JRAR), 2024
Wei Chen
Yun Li
Yuan Tian
AI4CE
223
0
0
09 Mar 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
213
1
0
08 Mar 2024
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Tolga Dimlioglu
A. Choromańska
240
6
0
07 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
304
13
0
05 Mar 2024
Level Set Teleportation: An Optimization Perspective
Aaron Mishkin
A. Bietti
Robert Mansel Gower
313
1
0
05 Mar 2024
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Tianyu Wang
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
323
22
0
04 Mar 2024
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
371
12
0
01 Mar 2024
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms
Toki Tahmid Inan
Mingrui Liu
Amarda Shehu
223
0
0
01 Mar 2024
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou
Yicong Liu
Yiman Hu
Yuhua Li
Ruixuan Li
235
23
0
01 Mar 2024
Fine-tuning with Very Large Dropout
Jianyu Zhang
Léon Bottou
400
9
0
01 Mar 2024
Batch size invariant Adam
Xi Wang
Laurence Aitchison
261
4
0
29 Feb 2024
Gradient Alignment for Cross-Domain Face Anti-Spoofing
B. Le
Simon S. Woo
CVBM
402
36
0
29 Feb 2024
Pre-training Differentially Private Models with Limited Public Data
Zhiqi Bu
Xinwei Zhang
Mingyi Hong
Sheng Zha
George Karypis
302
6
0
28 Feb 2024
Unveiling Privacy, Memorization, and Input Curvature Links
Deepak Ravikumar
Efstathia Soufleri
Abolfazl Hashemi
Kaushik Roy
297
13
0
28 Feb 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
190
0
0
28 Feb 2024
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
126
0
0
26 Feb 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
212
1
0
24 Feb 2024
Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization
Zirui Zhu
Yong Liu
Zangwei Zheng
Huifeng Guo
Yang You
149
0
0
23 Feb 2024
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang
Hangzhou He
Jingyu Zhu
Huanran Chen
Yifei Wang
Zeming Wei
AAML
390
24
0
23 Feb 2024
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
Dhananjay Saikumar
Blesson Varghese
238
2
0
21 Feb 2024
Investigating the Histogram Loss in Regression
Ehsan Imani
Kai Luedemann
Sam Scholnick-Hughes
Esraa Elelimy
Martha White
UQCV
165
10
0
20 Feb 2024
Scaling physics-informed hard constraints with mixture-of-experts
N. Chalapathi
Yiheng Du
Aditi Krishnapriyan
AI4CE
231
27
0
20 Feb 2024
OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations
Yao Shu
Jiongfeng Fang
Y. He
Fei Richard Yu
165
0
0
18 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
402
9
0
17 Feb 2024
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert
Ambroise Odonnat
Vasilii Feofanov
Aladin Virmaux
Giuseppe Paolo
Themis Palpanas
I. Redko
AI4TS
311
54
0
15 Feb 2024
Previous
1
2
3
...
6
7
8
...
32
33
34
Next
Page 7 of 34
Page
of 34
Go