ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
595
2
0
30 Apr 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
222
0
0
25 Apr 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
216
0
0
24 Apr 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN3DPC
247
1
0
19 Apr 2024
Singular-limit analysis of gradient descent with noise injection
Singular-limit analysis of gradient descent with noise injection
Anna Shalova
André Schlichting
M. Peletier
222
5
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
278
5
0
17 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Flatness Improves Backbone Generalisation in Few-shot ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Rui Li
Martin Trapp
Talal Alrawajfeh
Arno Solin
444
1
0
11 Apr 2024
Exploring Neural Network Landscapes: Star-Shaped and Geodesic
  Connectivity
Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity
Zhanran Lin
Puheng Li
Lei Wu
473
9
0
09 Apr 2024
Statistical Mechanics and Artificial Neural Networks: Principles,
  Models, and Applications
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
326
0
0
05 Apr 2024
Rolling the dice for better deep learning performance: A study of
  randomness techniques in deep neural networks
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Mohammed Ghaith Altarabichi
Sławomir Nowaczyk
Sepideh Pashami
Peyman Sheikholharam Mashhadi
Julia Handl
169
25
0
05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Information-Theoretic Generalization Bounds for Deep Neural NetworksIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2024
Haiyun He
Christina Lee Yu
443
10
0
04 Apr 2024
Make Continual Learning Stronger via C-Flat
Make Continual Learning Stronger via C-Flat
Ang Bian
Wei Li
Hangjie Yuan
Chengrong Yu
Zixiang Zhao
Mang Wang
Aojun Lu
Tao Feng
271
25
0
01 Apr 2024
Revisiting Random Weight Perturbation for Efficiently Improving
  Generalization
Revisiting Random Weight Perturbation for Efficiently Improving Generalization
Tao Li
Qinghua Tao
Weihao Yan
Zehao Lei
Yingwen Wu
Kun Fang
Mingzhen He
Xiaolin Huang
AAML
388
11
0
30 Mar 2024
Exploring Pathological Speech Quality Assessment with ASR-Powered
  Wav2Vec2 in Data-Scarce Context
Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
Tuan Nguyen
C. Fredouille
A. Ghio
M. Balaguer
Virginie Woisard
156
3
0
29 Mar 2024
Model Stock: All we need is just a few fine-tuned models
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang
Sangdoo Yun
Dongyoon Han
OODDMoMe
416
73
0
28 Mar 2024
On the Benefits of Over-parameterization for Out-of-Distribution
  Generalization
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao
Yong Lin
Difan Zou
Tong Zhang
OODDOOD
246
6
0
26 Mar 2024
Self-Supervised Multi-Frame Neural Scene Flow
Self-Supervised Multi-Frame Neural Scene Flow
Dongrui Liu
Daqi Liu
Xueqian Li
Sihao Lin
Hongwei Xie
Bing Wang
Xiaojun Chang
Lei Chu
407
3
0
24 Mar 2024
SM2C: Boost the Semi-supervised Segmentation for Medical Image by using
  Meta Pseudo Labels and Mixed Images
SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images
Yifei Wang
Chuhong Zhu
275
0
0
24 Mar 2024
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude
  Pruning
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning
Tausifa Jan Saleem
Ramanjit Ahuja
Surendra Prasad
Brejesh Lall
320
0
0
22 Mar 2024
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Anh-Vu Bui
Vy Vo
Tung Pham
Dinh Q. Phung
Trung Le
FedMLUQCV
264
1
0
19 Mar 2024
Friendly Sharpness-Aware Minimization
Friendly Sharpness-Aware MinimizationComputer Vision and Pattern Recognition (CVPR), 2024
Tao Li
Pan Zhou
Zhengbao He
Xinwen Cheng
Xiaolin Huang
AAML
274
36
0
19 Mar 2024
Semiparametric Token-Sequence Co-Supervision
Semiparametric Token-Sequence Co-SupervisionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Hyunji Lee
Doyoung Kim
Jihoon Jun
Se June Joo
Joel Jang
Kyoung-Woon On
Minjoon Seo
276
0
0
14 Mar 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
365
4
0
12 Mar 2024
Unveiling the Significance of Toddler-Inspired Reward Transition in
  Goal-Oriented Reinforcement Learning
Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2024
Junseok Park
Yoonsung Kim
Hee Bin Yoo
Min Whoo Lee
Kibeom Kim
Won-Seok Choi
Minsu Lee
Byoung-Tak Zhang
OffRL
224
2
0
11 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Transformers Learn Low Sensitivity Functions: Investigations and ImplicationsInternational Conference on Learning Representations (ICLR), 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Willie Neiswanger
470
7
0
11 Mar 2024
CarbonNet: How Computer Vision Plays a Role in Climate Change?
  Application: Learning Geomechanics from Subsurface Geometry of CCS to
  Mitigate Global Warming
CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global WarmingJournal of Robotics and Automation Research (JRAR), 2024
Wei Chen
Yun Li
Yuan Tian
AI4CE
223
0
0
09 Mar 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on
  Training Sets
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
213
1
0
08 Mar 2024
GRAWA: Gradient-based Weighted Averaging for Distributed Training of
  Deep Learning Models
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Tolga Dimlioglu
A. Choromańska
240
6
0
07 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
304
13
0
05 Mar 2024
Level Set Teleportation: An Optimization Perspective
Level Set Teleportation: An Optimization Perspective
Aaron Mishkin
A. Bietti
Robert Mansel Gower
313
1
0
05 Mar 2024
A Survey on Evaluation of Out-of-Distribution Generalization
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Tianyu Wang
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
323
22
0
04 Mar 2024
Merging Text Transformer Models from Different Initializations
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
371
12
0
01 Mar 2024
Beyond Single-Model Views for Deep Learning: Optimization versus
  Generalizability of Stochastic Optimization Algorithms
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms
Toki Tahmid Inan
Mingrui Liu
Amarda Shehu
223
0
0
01 Mar 2024
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou
Yicong Liu
Yiman Hu
Yuhua Li
Ruixuan Li
235
23
0
01 Mar 2024
Fine-tuning with Very Large Dropout
Fine-tuning with Very Large Dropout
Jianyu Zhang
Léon Bottou
400
9
0
01 Mar 2024
Batch size invariant Adam
Batch size invariant Adam
Xi Wang
Laurence Aitchison
261
4
0
29 Feb 2024
Gradient Alignment for Cross-Domain Face Anti-Spoofing
Gradient Alignment for Cross-Domain Face Anti-Spoofing
B. Le
Simon S. Woo
CVBM
402
36
0
29 Feb 2024
Pre-training Differentially Private Models with Limited Public Data
Pre-training Differentially Private Models with Limited Public Data
Zhiqi Bu
Xinwei Zhang
Mingyi Hong
Sheng Zha
George Karypis
302
6
0
28 Feb 2024
Unveiling Privacy, Memorization, and Input Curvature Links
Unveiling Privacy, Memorization, and Input Curvature Links
Deepak Ravikumar
Efstathia Soufleri
Abolfazl Hashemi
Kaushik Roy
297
13
0
28 Feb 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated
  Vehicle Routing Problem
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
190
0
0
28 Feb 2024
Layer-wise Regularized Dropout for Neural Language Models
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
126
0
0
26 Feb 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating
  Sharpness aware Minimization
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
212
1
0
24 Feb 2024
Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian
  Eigenvalue Regularization
Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization
Zirui Zhu
Yong Liu
Zangwei Zheng
Huifeng Guo
Yang You
149
0
0
23 Feb 2024
On the Duality Between Sharpness-Aware Minimization and Adversarial
  Training
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang
Hangzhou He
Jingyu Zhu
Huanran Chen
Yifei Wang
Zeming Wei
AAML
390
24
0
23 Feb 2024
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
Dhananjay Saikumar
Blesson Varghese
238
2
0
21 Feb 2024
Investigating the Histogram Loss in Regression
Investigating the Histogram Loss in Regression
Ehsan Imani
Kai Luedemann
Sam Scholnick-Hughes
Esraa Elelimy
Martha White
UQCV
165
10
0
20 Feb 2024
Scaling physics-informed hard constraints with mixture-of-experts
Scaling physics-informed hard constraints with mixture-of-experts
N. Chalapathi
Yiheng Du
Aditi Krishnapriyan
AI4CE
231
27
0
20 Feb 2024
OptEx: Expediting First-Order Optimization with Approximately
  Parallelized Iterations
OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations
Yao Shu
Jiongfeng Fang
Y. He
Fei Richard Yu
165
0
0
18 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
402
9
0
17 Feb 2024
SAMformer: Unlocking the Potential of Transformers in Time Series
  Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert
Ambroise Odonnat
Vasilii Feofanov
Aladin Virmaux
Giuseppe Paolo
Themis Palpanas
I. Redko
AI4TS
311
54
0
15 Feb 2024
Previous
123...678...323334
Next
Page 7 of 34
Pageof 34