ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Implicit Bias in Noisy-SGD: With Applications to Differentially Private
  Training
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Tom Sander
Maxime Sylvestre
Alain Durmus
209
3
0
13 Feb 2024
Game of Trojans: Adaptive Adversaries Against Output-based
  Trojaned-Model Detectors
Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors
D. Sahabandu
Xiaojun Xu
Arezoo Rajabi
Luyao Niu
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
AAML
208
1
0
12 Feb 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
223
2
0
07 Feb 2024
Strong convexity-guided hyper-parameter optimization for flatter losses
Strong convexity-guided hyper-parameter optimization for flatter losses
Rahul Yedida
Snehanshu Saha
327
0
0
07 Feb 2024
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Omead Brandon Pooladzandi
Xi-Lin Li
245
10
0
07 Feb 2024
Subsampling is not Magic: Why Large Batch Sizes Work for Differentially
  Private Stochastic Optimisation
Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation
Ossi Raisa
Hibiki Ito
Antti Honkela
249
8
0
06 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
Deconstructing the Goldilocks Zone of Neural Network InitializationInternational Conference on Machine Learning (ICML), 2024
Artem Vysogorets
Anna Dawid
Julia Kempe
249
3
0
05 Feb 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Naoki Sato
Hideaki Iiduka
ODL
453
1
0
04 Feb 2024
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor
  Learning
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor LearningInternational Journal of Computer Vision (IJCV), 2024
Baoyuan Wu
Hongrui Chen
Ruotong Wang
Zihao Zhu
Shaokui Wei
Danni Yuan
Mingli Zhu
Ke Xu
Li Liu
Chaoxiao Shen
AAMLELM
281
19
0
26 Jan 2024
Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN
Catch-Up Mix: Catch-Up Class for Struggling Filters in CNNAAAI Conference on Artificial Intelligence (AAAI), 2024
Minsoo Kang
Minkoo Kang
Suhyun Kim
129
7
0
24 Jan 2024
DALex: Lexicase-like Selection via Diverse Aggregation
DALex: Lexicase-like Selection via Diverse AggregationEuropean Conference on Genetic Programming (EuroGP), 2024
Andrew Ni
Lijie Ding
Lee Spector
260
8
0
23 Jan 2024
A Precise Characterization of SGD Stability Using Loss Surface Geometry
A Precise Characterization of SGD Stability Using Loss Surface GeometryInternational Conference on Learning Representations (ICLR), 2024
Gregory Dexter
Borja Ocejo
S. Keerthi
Aman Gupta
Ayan Acharya
Rajiv Khanna
MLT
249
1
0
22 Jan 2024
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Leonardo Castro-Gonzalez
Yi-Ling Chung
Hannak Rose Kirk
John Francis
Angus R. Williams
Pica Johansson
Jonathan Bright
284
2
0
22 Jan 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
500
10
0
22 Jan 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Understanding the Generalization Benefits of Late Learning Rate DecayInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
268
8
0
21 Jan 2024
The Surprising Harmfulness of Benign Overfitting for Adversarial
  Robustness
The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness
Yifan Hao
Tong Zhang
AAML
508
6
0
19 Jan 2024
Improving OCR Quality in 19th Century Historical Documents Using a
  Combined Machine Learning Based Approach
Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach
David Fleischhacker
Wolfgang Goederle
Roman Kern
109
6
0
15 Jan 2024
Stabilizing Sharpness-aware Minimization Through A Simple
  Renormalization Strategy
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Chengli Tan
Jiangshe Zhang
Junmin Liu
Yicheng Wang
Yunda Hao
AAML
320
5
0
14 Jan 2024
EsaCL: Efficient Continual Learning of Sparse Models
EsaCL: Efficient Continual Learning of Sparse ModelsSDM (SDM), 2024
Weijieying Ren
V. Honavar
CLL
198
4
0
11 Jan 2024
Standardizing Your Training Process for Human Activity Recognition
  Models: A Comprehensive Review in the Tunable Factors
Standardizing Your Training Process for Human Activity Recognition Models: A Comprehensive Review in the Tunable FactorsInternational Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous), 2024
Yiran Huang
Hai-qiang Zhao
Yexu Zhou
T. Riedel
Michael Beigl
123
3
0
10 Jan 2024
Preserving Silent Features for Domain Generalization
Preserving Silent Features for Domain Generalization
Chujie Zhao
Tianren Zhang
Feng Chen
277
0
0
06 Jan 2024
Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient
  Accumulation
Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient AccumulationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xuannan Liu
Yaoyao Zhong
Weihong Deng
Hongzhi Shi
Xingchen Cui
Yunfeng Yin
Dongchao Wen
PICVFedML
181
2
0
03 Jan 2024
$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
fff-Divergence Based Classification: Beyond the Use of Cross-EntropyInternational Conference on Machine Learning (ICML), 2024
Nicola Novello
Andrea M. Tonello
316
17
0
02 Jan 2024
Hidden Minima in Two-Layer ReLU Networks
Hidden Minima in Two-Layer ReLU Networks
Yossi Arjevani
356
3
0
28 Dec 2023
Engineered Ordinary Differential Equations as Classification Algorithm
  (EODECA): thorough characterization and testing
Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing
Raffaele Marino
L. Buffoni
Lorenzo Chicchi
Lorenzo Giambagli
Duccio Fanelli
338
1
0
22 Dec 2023
CR-SAM: Curvature Regularized Sharpness-Aware Minimization
CR-SAM: Curvature Regularized Sharpness-Aware Minimization
Tao Wu
Tie Luo
D. C. Wunsch
228
11
0
21 Dec 2023
Enhancing Neural Training via a Correlated Dynamics Model
Enhancing Neural Training via a Correlated Dynamics Model
Jonathan Brokman
Roy Betser
Rotem Turjeman
Tom Berkov
I. Cohen
Guy Gilboa
177
5
0
20 Dec 2023
LRS: Enhancing Adversarial Transferability through Lipschitz Regularized
  Surrogate
LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate
Tao Wu
Tie Luo
D. C. Wunsch
256
7
0
20 Dec 2023
Doubly Perturbed Task Free Continual Learning
Doubly Perturbed Task Free Continual Learning
Byung Hyun Lee
Min-hwan Oh
Se Young Chun
344
5
0
20 Dec 2023
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Weixi Song
Z. Li
Lefei Zhang
Hai Zhao
Bo Du
VLM
377
12
0
19 Dec 2023
Mixture-of-Linear-Experts for Long-term Time Series Forecasting
Mixture-of-Linear-Experts for Long-term Time Series Forecasting
Ronghao Ni
Zinan Lin
Shuaiqi Wang
Giulia Fanti
AI4TS
276
42
0
11 Dec 2023
PULSAR: Graph based Positive Unlabeled Learning with Multi Stream
  Adaptive Convolutions for Parkinson's Disease Recognition
PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition
Md Zarif Ul Alam
Md. Saiful Islam
Ehsan Hoque
M. S. Rahman
OOD
196
1
0
10 Dec 2023
Cross Domain Generative Augmentation: Domain Generalization with Latent
  Diffusion Models
Cross Domain Generative Augmentation: Domain Generalization with Latent Diffusion Models
S. Hemati
Mahdi Beitollahi
A. Estiri
Bassel Al Omari
Xi Chen
Guojun Zhang
175
9
0
08 Dec 2023
Simplifying Neural Network Training Under Class Imbalance
Simplifying Neural Network Training Under Class ImbalanceNeural Information Processing Systems (NeurIPS), 2023
Ravid Shwartz-Ziv
Micah Goldblum
Yucen Lily Li
C. Bayan Bruss
Andrew Gordon Wilson
275
32
0
05 Dec 2023
Optimal Sample Complexity of Contrastive Learning
Optimal Sample Complexity of Contrastive LearningInternational Conference on Learning Representations (ICLR), 2023
Noga Alon
Dmitrii Avdiukhin
Dor Elboim
Orr Fischer
G. Yaroslavtsev
SSL
291
11
0
01 Dec 2023
Directions of Curvature as an Explanation for Loss of Plasticity
Directions of Curvature as an Explanation for Loss of Plasticity
Alex Lewandowski
Haruto Tanaka
Dale Schuurmans
Marlos C. Machado
453
16
0
30 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware MinimizationConference on Uncertainty in Artificial Intelligence (UAI), 2023
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
802
2
0
29 Nov 2023
Digital Twin-Enhanced Deep Reinforcement Learning for Resource
  Management in Networks Slicing
Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks SlicingIEEE Transactions on Communications (IEEE Trans. Commun.), 2023
Zhengming Zhang
Yongming Huang
Cheng Zhang
Qingbi Zheng
Luxi Yang
Xiaohu You
271
39
0
28 Nov 2023
MIA-BAD: An Approach for Enhancing Membership Inference Attack and its
  Mitigation with Federated Learning
MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated LearningInternational Conference on Computing, Networking and Communications (ICNC), 2023
Soumya Banerjee
Sandip Roy
Sayyed Farid Ahamed
Devin Quinn
Marc Vucovich
Dhruv Nandakumar
K. Choi
Abdul Rahman
Edward Bowen
Sachin Shetty
255
10
0
28 Nov 2023
Should We Learn Most Likely Functions or Parameters?
Should We Learn Most Likely Functions or Parameters?Neural Information Processing Systems (NeurIPS), 2023
Shikai Qiu
Tim G. J. Rudner
Sanyam Kapoor
Andrew Gordon Wilson
142
11
0
27 Nov 2023
Achieving Margin Maximization Exponentially Fast via Progressive Norm
  Rescaling
Achieving Margin Maximization Exponentially Fast via Progressive Norm RescalingInternational Conference on Machine Learning (ICML), 2023
Mingze Wang
Zeping Min
Lei Wu
491
3
0
24 Nov 2023
SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
  Landscape
SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape
Hua Zheng
Kuang-Hung Liu
Igor Fedorov
Xin Zhang
Wen-Yen Chen
Wei Wen
301
2
0
22 Nov 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for
  Enhanced Dataset Pruning
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset PruningComputer Vision and Pattern Recognition (CVPR), 2023
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Qiufeng Wang
350
30
0
22 Nov 2023
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
  Advanced Object Detection
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection
Ahmed Sharshar
Aleksandr Matsun
198
5
0
21 Nov 2023
Generalization Bounds for Robust Contrastive Learning: From Theory to Practice
Generalization Bounds for Robust Contrastive Learning: From Theory to Practice
Ngoc N. Tran
Lam C. Tran
Hoang Phan
Anh-Vu Bui
Tung Pham
Toan M. Tran
Dinh Q. Phung
Trung Le
SSLNoLa
384
0
0
16 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions:
  Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
394
4
0
15 Nov 2023
A PAC-Bayesian Perspective on the Interpolating Information Criterion
A PAC-Bayesian Perspective on the Interpolating Information Criterion
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
275
2
0
13 Nov 2023
Cross-Silo Federated Learning Across Divergent Domains with Iterative
  Parameter Alignment
Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment
Matt Gorbett
Hossein Shirazi
Indrakshi Ray
FedML
425
2
0
08 Nov 2023
EControl: Fast Distributed Optimization with Compression and Error
  Control
EControl: Fast Distributed Optimization with Compression and Error ControlInternational Conference on Learning Representations (ICLR), 2023
Yuan Gao
Rustem Islamov
Sebastian U. Stich
261
17
0
06 Nov 2023
The Pursuit of Human Labeling: A New Perspective on Unsupervised
  Learning
The Pursuit of Human Labeling: A New Perspective on Unsupervised LearningNeural Information Processing Systems (NeurIPS), 2023
Artyom Gadetsky
Maria Brbić
274
9
0
06 Nov 2023
Previous
123...789...323334
Next
Page 8 of 34
Pageof 34