ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware
  Optimization for Enhanced Continual Learning
MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual Learning
Xingyu Li
Bo Tang
VLMCLL
64
0
0
15 May 2024
Why is SAM Robust to Label Noise?
Why is SAM Robust to Label Noise?
Christina Baek
Zico Kolter
Aditi Raghunathan
NoLaAAML
94
11
0
06 May 2024
Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Zhiwei Wang
Lulu Zhang
Zhongwang Zhang
Z. Xu
58
0
0
06 May 2024
A separability-based approach to quantifying generalization: which layer
  is best?
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
104
4
0
02 May 2024
PackVFL: Efficient HE Packing for Vertical Federated Learning
PackVFL: Efficient HE Packing for Vertical Federated Learning
Liu Yang
Shuowei Cai
Di Chai
Junxue Zhang
Han Tian
Yilun Jin
Kun Guo
Kai Chen
Qiang Yang
FedML
58
1
0
01 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
199
2
0
30 Apr 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
78
0
0
25 Apr 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
89
0
0
24 Apr 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN3DPC
70
0
0
19 Apr 2024
Singular-limit analysis of gradient descent with noise injection
Singular-limit analysis of gradient descent with noise injection
Anna Shalova
André Schlichting
M. Peletier
59
2
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
89
1
0
17 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Flatness Improves Backbone Generalisation in Few-shot Classification
Rui Li
Martin Trapp
Talal Alrawajfeh
Arno Solin
111
0
0
11 Apr 2024
Exploring Neural Network Landscapes: Star-Shaped and Geodesic
  Connectivity
Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity
Zhanran Lin
Puheng Li
Lei Wu
255
9
0
09 Apr 2024
Statistical Mechanics and Artificial Neural Networks: Principles,
  Models, and Applications
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
77
0
0
05 Apr 2024
Rolling the dice for better deep learning performance: A study of
  randomness techniques in deep neural networks
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Mohammed Ghaith Altarabichi
Sławomir Nowaczyk
Sepideh Pashami
Peyman Sheikholharam Mashhadi
Julia Handl
40
11
0
05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
99
6
0
04 Apr 2024
Make Continual Learning Stronger via C-Flat
Make Continual Learning Stronger via C-Flat
Ang Bian
Wei Li
Hangjie Yuan
Chengrong Yu
Zixiang Zhao
Mang Wang
Aojun Lu
Tao Feng
77
12
0
01 Apr 2024
Revisiting Random Weight Perturbation for Efficiently Improving
  Generalization
Revisiting Random Weight Perturbation for Efficiently Improving Generalization
Tao Li
Qinghua Tao
Weihao Yan
Zehao Lei
Yingwen Wu
Kun Fang
Mingzhen He
Xiaolin Huang
AAML
101
6
0
30 Mar 2024
Exploring Pathological Speech Quality Assessment with ASR-Powered
  Wav2Vec2 in Data-Scarce Context
Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
Tuan Nguyen
C. Fredouille
A. Ghio
M. Balaguer
Virginie Woisard
38
1
0
29 Mar 2024
On the Benefits of Over-parameterization for Out-of-Distribution
  Generalization
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao
Yong Lin
Difan Zou
Tong Zhang
OODDOOD
88
6
0
26 Mar 2024
Self-Supervised Multi-Frame Neural Scene Flow
Self-Supervised Multi-Frame Neural Scene Flow
Dongrui Liu
Daqi Liu
Xueqian Li
Sihao Lin
Hongwei Xie
Bing Wang
Xiaojun Chang
Lei Chu
135
3
0
24 Mar 2024
SM2C: Boost the Semi-supervised Segmentation for Medical Image by using
  Meta Pseudo Labels and Mixed Images
SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images
Yifei Wang
Chuhong Zhu
85
0
0
24 Mar 2024
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude
  Pruning
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning
Tausifa Jan Saleem
Ramanjit Ahuja
Surendra Prasad
Brejesh Lall
84
0
0
22 Mar 2024
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Anh-Vu Bui
Vy Vo
Tung Pham
Dinh Q. Phung
Trung Le
FedMLUQCV
70
1
0
19 Mar 2024
Friendly Sharpness-Aware Minimization
Friendly Sharpness-Aware Minimization
Tao Li
Pan Zhou
Zhengbao He
Xinwen Cheng
Xiaolin Huang
AAML
80
17
0
19 Mar 2024
Semiparametric Token-Sequence Co-Supervision
Semiparametric Token-Sequence Co-Supervision
Hyunji Lee
Doyoung Kim
Jihoon Jun
Se June Joo
Joel Jang
Kyoung-Woon On
Minjoon Seo
114
1
0
14 Mar 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of
  Neurons
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
106
1
0
12 Mar 2024
Unveiling the Significance of Toddler-Inspired Reward Transition in
  Goal-Oriented Reinforcement Learning
Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning
Junseok Park
Yoonsung Kim
Hee Bin Yoo
Min Whoo Lee
Kibeom Kim
Won-Seok Choi
Minsu Lee
Byoung-Tak Zhang
OffRL
68
1
0
11 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Vatsal Sharan
89
2
0
11 Mar 2024
CarbonNet: How Computer Vision Plays a Role in Climate Change?
  Application: Learning Geomechanics from Subsurface Geometry of CCS to
  Mitigate Global Warming
CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global Warming
Wei Chen
Yun Li
Yuan Tian
AI4CE
52
0
0
09 Mar 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on
  Training Sets
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
64
0
0
08 Mar 2024
GRAWA: Gradient-based Weighted Averaging for Distributed Training of
  Deep Learning Models
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Tolga Dimlioglu
A. Choromańska
73
4
0
07 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
73
8
0
05 Mar 2024
Level Set Teleportation: An Optimization Perspective
Level Set Teleportation: An Optimization Perspective
Aaron Mishkin
A. Bietti
Robert Mansel Gower
98
1
0
05 Mar 2024
A Survey on Evaluation of Out-of-Distribution Generalization
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Jiashuo Liu
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
101
9
0
04 Mar 2024
Merging Text Transformer Models from Different Initializations
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
107
8
0
01 Mar 2024
Beyond Single-Model Views for Deep Learning: Optimization versus
  Generalizability of Stochastic Optimization Algorithms
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms
Toki Tahmid Inan
Mingrui Liu
Amarda Shehu
54
0
0
01 Mar 2024
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou
Yicong Liu
Yiman Hu
Yuhua Li
Ruixuan Li
86
7
0
01 Mar 2024
Fine-tuning with Very Large Dropout
Fine-tuning with Very Large Dropout
Jianyu Zhang
Léon Bottou
124
2
0
01 Mar 2024
Batch size invariant Adam
Batch size invariant Adam
Xi Wang
Laurence Aitchison
87
2
0
29 Feb 2024
Gradient Alignment for Cross-Domain Face Anti-Spoofing
Gradient Alignment for Cross-Domain Face Anti-Spoofing
B. Le
Simon S. Woo
CVBM
83
20
0
29 Feb 2024
Pre-training Differentially Private Models with Limited Public Data
Pre-training Differentially Private Models with Limited Public Data
Zhiqi Bu
Xinwei Zhang
Mingyi Hong
Sheng Zha
George Karypis
114
4
0
28 Feb 2024
Unveiling Privacy, Memorization, and Input Curvature Links
Unveiling Privacy, Memorization, and Input Curvature Links
Deepak Ravikumar
Efstathia Soufleri
Abolfazl Hashemi
Kaushik Roy
97
6
0
28 Feb 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated
  Vehicle Routing Problem
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
59
0
0
28 Feb 2024
Layer-wise Regularized Dropout for Neural Language Models
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
46
0
0
26 Feb 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating
  Sharpness aware Minimization
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
70
1
0
24 Feb 2024
Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian
  Eigenvalue Regularization
Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization
Zirui Zhu
Yong Liu
Zangwei Zheng
Huifeng Guo
Yang You
45
0
0
23 Feb 2024
On the Duality Between Sharpness-Aware Minimization and Adversarial
  Training
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang
Hangzhou He
Jingyu Zhu
Huanran Chen
Yifei Wang
Zeming Wei
AAML
120
15
0
23 Feb 2024
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
Dhananjay Saikumar
Blesson Varghese
65
1
0
21 Feb 2024
Investigating the Histogram Loss in Regression
Investigating the Histogram Loss in Regression
Ehsan Imani
Kai Luedemann
Sam Scholnick-Hughes
Esraa Elelimy
Martha White
UQCV
55
6
0
20 Feb 2024
Previous
123456...303132
Next