ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
A Comprehensive Survey of Continual Learning: Theory, Method and
  Application
A Comprehensive Survey of Continual Learning: Theory, Method and Application
Liyuan Wang
Xingxing Zhang
Hang Su
Jun Zhu
KELMCLL
219
712
0
31 Jan 2023
Emergence of Maps in the Memories of Blind Navigation Agents
Emergence of Maps in the Memories of Blind Navigation Agents
Erik Wijmans
Manolis Savva
Irfan Essa
Stefan Lee
Ari S. Morcos
Dhruv Batra
67
33
0
30 Jan 2023
Deep networks for system identification: a Survey
Deep networks for system identification: a Survey
G. Pillonetto
Aleksandr Aravkin
Daniel Gedon
L. Ljung
Antônio H. Ribeiro
Thomas B. Schon
OOD
100
45
0
30 Jan 2023
The Hidden Power of Pure 16-bit Floating-Point Neural Networks
The Hidden Power of Pure 16-bit Floating-Point Neural Networks
Juyoung Yun
Byungkon Kang
Zhoulai Fu
MQ
28
1
0
30 Jan 2023
Do We Really Need Graph Neural Networks for Traffic Forecasting?
Do We Really Need Graph Neural Networks for Traffic Forecasting?
Xu Liu
Yuxuan Liang
Chao Huang
Hengchang Hu
Yushi Cao
Bryan Hooi
Roger Zimmermann
AI4TS
93
22
0
30 Jan 2023
Pipe-BD: Pipelined Parallel Blockwise Distillation
Pipe-BD: Pipelined Parallel Blockwise Distillation
Hongsun Jang
Jaewon Jung
Jaeyong Song
Joonsang Yu
Youngsok Kim
Jinho Lee
MoEAI4CE
66
2
0
29 Jan 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware
  Minimization
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Woojin Lee
Jaewook Lee
44
9
0
27 Jan 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
117
63
0
26 Jan 2023
On Batching Variable Size Inputs for Training End-to-End Speech
  Enhancement Systems
On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems
Philippe Gonzalez
T. S. Alstrøm
Tobias May
71
9
0
25 Jan 2023
ScaDLES: Scalable Deep Learning over Streaming data at the Edge
ScaDLES: Scalable Deep Learning over Streaming data at the Edge
S. Tyagi
Martin Swany
42
6
0
21 Jan 2023
An SDE for Modeling SAM: Theory and Insights
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
110
15
0
19 Jan 2023
Catapult Dynamics and Phase Transitions in Quadratic Nets
Catapult Dynamics and Phase Transitions in Quadratic Nets
David Meltzer
Junyu Liu
67
9
0
18 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
78
13
0
16 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image
  Modeling
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
Xin Ma
Chang-Shu Liu
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
137
9
0
31 Dec 2022
Escaping Saddle Points for Effective Generalization on Class-Imbalanced
  Data
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
75
31
0
28 Dec 2022
Deep learning for size-agnostic inverse design of random-network 3D
  printed mechanical metamaterials
Deep learning for size-agnostic inverse design of random-network 3D printed mechanical metamaterials
H. Pahlavani
Kostas Tsifoutis-Kazolis
P. Mody
Jie Zhou
M. J. Mirzaali
A. A. Zadpoor
AI4CE
64
41
0
22 Dec 2022
Domain Generalization with Correlated Style Uncertainty
Domain Generalization with Correlated Style Uncertainty
Zheyu Zhang
Bin Wang
Debesh Jha
Ugur Demir
Ulas Bagci
OOD
106
6
0
20 Dec 2022
Colab NAS: Obtaining lightweight task-specific convolutional neural
  networks following Occam's razor
Colab NAS: Obtaining lightweight task-specific convolutional neural networks following Occam's razor
Andrea Mattia Garavagno
D. Leonardis
A. Frisoli
96
1
0
15 Dec 2022
A Statistical Model for Predicting Generalization in Few-Shot
  Classification
A Statistical Model for Predicting Generalization in Few-Shot Classification
Yassir Bendou
Vincent Gripon
Bastien Pasdeloup
Lukas Mauch
Stefan Uhlich
Fabien Cardinaux
G. B. Hacene
Javier Alonso García
87
2
0
13 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic
  Weight Averaging
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
100
11
0
12 Dec 2022
Accelerating Self-Supervised Learning via Efficient Training Strategies
Accelerating Self-Supervised Learning via Efficient Training Strategies
Mustafa Taha Koccyiugit
Timothy M. Hospedales
Hakan Bilen
SSL
66
8
0
11 Dec 2022
Error-aware Quantization through Noise Tempering
Error-aware Quantization through Noise Tempering
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
38
2
0
11 Dec 2022
Cyclic Block Coordinate Descent With Variance Reduction for Composite
  Nonconvex Optimization
Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
Xu Cai
Chaobing Song
Stephen J. Wright
Jelena Diakonikolas
80
14
0
09 Dec 2022
Adversarial Weight Perturbation Improves Generalization in Graph Neural
  Networks
Adversarial Weight Perturbation Improves Generalization in Graph Neural Networks
Yihan Wu
Aleksandar Bojchevski
Heng Huang
AAML
99
30
0
09 Dec 2022
Improved Deep Neural Network Generalization Using m-Sharpness-Aware
  Minimization
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
D. Durfee
Ayan Acharya
S. Keerthi
Rahul Mazumder
AAML
53
5
0
07 Dec 2022
Convergence of ease-controlled Random Reshuffling gradient Algorithms
  under Lipschitz smoothness
Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness
R. Seccia
Corrado Coppola
G. Liuzzi
L. Palagi
61
2
0
04 Dec 2022
PiPar: Pipeline Parallelism for Collaborative Machine Learning
PiPar: Pipeline Parallelism for Collaborative Machine Learning
Zihan Zhang
Philip Rodgers
Peter Kilpatrick
I. Spence
Blesson Varghese
FedML
80
3
0
01 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
132
10
0
01 Dec 2022
Adaptive adversarial training method for improving multi-scale GAN based
  on generalization bound theory
Adaptive adversarial training method for improving multi-scale GAN based on generalization bound theory
Jin-Lin Tang
B. Tao
Zeyu Gong
Zhoupin Yin
AI4CE
61
1
0
30 Nov 2022
Boosted Dynamic Neural Networks
Boosted Dynamic Neural Networks
Haichao Yu
Haoxiang Li
G. Hua
Gao Huang
Humphrey Shi
96
8
0
30 Nov 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Cheng-i Wang
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
100
2
0
29 Nov 2022
A survey of deep learning optimizers -- first and second order methods
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
97
7
0
28 Nov 2022
Exploring Temporal Information Dynamics in Spiking Neural Networks
Exploring Temporal Information Dynamics in Spiking Neural Networks
Youngeun Kim
Yuhang Li
Hyoungseob Park
Yeshwanth Venkatesha
Anna Hambitzer
Priyadarshini Panda
86
35
0
26 Nov 2022
The Vanishing Decision Boundary Complexity and the Strong First
  Component
The Vanishing Decision Boundary Complexity and the Strong First Component
Hengshuai Yao
UQCV
59
0
0
25 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining
  and Fisher Information Matrices
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
84
26
0
25 Nov 2022
Cross-Domain Ensemble Distillation for Domain Generalization
Cross-Domain Ensemble Distillation for Domain Generalization
Kyung-Jin Lee
Sungyeon Kim
Suha Kwak
FedMLOOD
79
38
0
25 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain
  Generalization
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDLMLTAI4CE
87
62
0
24 Nov 2022
Improving Multi-task Learning via Seeking Task-based Flat Regions
Improving Multi-task Learning via Seeking Task-based Flat Regions
Hoang Phan
Lam C. Tran
Ngoc N. Tran
Nhat Ho
Tuan Truong
Qi Lei
Nhat Ho
Dinh Q. Phung
Trung Le
209
11
0
24 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms
ModelDiff: A Framework for Comparing Learning Algorithms
Harshay Shah
Sung Min Park
Andrew Ilyas
Aleksander Madry
SyDa
104
29
0
22 Nov 2022
Efficient Generalization Improvement Guided by Random Weight
  Perturbation
Efficient Generalization Improvement Guided by Random Weight Perturbation
Tao Li
Wei Yan
Zehao Lei
Yingwen Wu
Kun Fang
Ming-Hsuan Yang
Xiaolin Huang
AAML
68
6
0
21 Nov 2022
Minimizing the Accumulated Trajectory Error to Improve Dataset
  Distillation
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation
Jiawei Du
Yiding Jiang
Vincent Y. F. Tan
Qiufeng Wang
Haizhou Li
DD
99
119
0
20 Nov 2022
SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for
  Improving DNN Generalization and Robustness
SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for Improving DNN Generalization and Robustness
Gonçalo Mordido
Sébastien Henwood
Sarath Chandar
Franccois Leduc-Primeau
AAML
42
0
0
18 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
122
101
0
15 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
86
10
0
14 Nov 2022
How Does Sharpness-Aware Minimization Minimize Sharpness?
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
85
50
0
10 Nov 2022
Instance-Dependent Generalization Bounds via Optimal Transport
Instance-Dependent Generalization Bounds via Optimal Transport
Songyan Hou
Parnian Kassraie
Anastasis Kratsios
Andreas Krause
Jonas Rothfuss
100
6
0
02 Nov 2022
Class Interference of Deep Neural Networks
Class Interference of Deep Neural Networks
Dongcui Diao
Hengshuai Yao
Bei Jiang
46
1
0
31 Oct 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
105
20
0
31 Oct 2022
Flatter, faster: scaling momentum for optimal speedup of SGD
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
98
5
0
28 Oct 2022
Watermarking for Out-of-distribution Detection
Watermarking for Out-of-distribution Detection
Qizhou Wang
Feng Liu
Yonggang Zhang
Jing Zhang
Chen Gong
Tongliang Liu
Bo Han
OODD
88
32
0
27 Oct 2022
Previous
123...101112...303132
Next