ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Cross-Silo Federated Learning Across Divergent Domains with Iterative
  Parameter Alignment
Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment
Matt Gorbett
Hossein Shirazi
Indrakshi Ray
FedML
74
2
0
08 Nov 2023
EControl: Fast Distributed Optimization with Compression and Error
  Control
EControl: Fast Distributed Optimization with Compression and Error Control
Yuan Gao
Rustem Islamov
Sebastian U. Stich
81
8
0
06 Nov 2023
The Pursuit of Human Labeling: A New Perspective on Unsupervised
  Learning
The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning
Artyom Gadetsky
Maria Brbić
69
7
0
06 Nov 2023
Signal Processing Meets SGD: From Momentum to Filter
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
92
0
0
06 Nov 2023
Generalization Bounds for Label Noise Stochastic Gradient Descent
Generalization Bounds for Label Noise Stochastic Gradient Descent
Jung Eun Huh
Patrick Rebeschini
61
1
0
01 Nov 2023
Density Estimation for Entry Guidance Problems using Deep Learning
Density Estimation for Entry Guidance Problems using Deep Learning
Jens A. Rataczak
Davide Amato
Jay W. McMahon
29
2
0
30 Oct 2023
Proving Linear Mode Connectivity of Neural Networks via Optimal
  Transport
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Damien Ferbach
Baptiste Goujaud
Gauthier Gidel
Aymeric Dieuleveut
MoMe
127
16
0
29 Oct 2023
Implicit Regularization in Over-Parameterized Support Vector Machine
Implicit Regularization in Over-Parameterized Support Vector Machine
Yang Sui
Xin He
Yang Bai
38
0
0
26 Oct 2023
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
  for Semi-Supervised Learning
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning
Zhuo Huang
Li Shen
Jun-chen Yu
Bo Han
Tongliang Liu
FedML
104
23
0
25 Oct 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
74
1
0
22 Oct 2023
Cooperative Minibatching in Graph Neural Networks
Cooperative Minibatching in Graph Neural Networks
M. F. Balin
Dominique LaSalle
Ümit V. Çatalyürek
GNN
60
1
0
19 Oct 2023
Domain Generalization Using Large Pretrained Models with
  Mixture-of-Adapters
Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters
Gyuseong Lee
Wooseok Jang
Jin Hyeon Kim
Jaewoo Jung
Seungryong Kim
MoEOOD
67
4
0
17 Oct 2023
"Reading Between the Heat": Co-Teaching Body Thermal Signatures for
  Non-intrusive Stress Detection
"Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection
Yi Xiao
Harshit Sharma
Zhongyang Zhang
D. Bergen-Cico
Tauhidur Rahman
Asif Salekin
58
3
0
15 Oct 2023
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced
  Transfer Learning
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning
Yihua Zhang
Yimeng Zhang
Aochuan Chen
Jinghan Jia
Jiancheng Liu
Gaowen Liu
Min-Fong Hong
Shiyu Chang
Sijia Liu
AAML
104
9
0
13 Oct 2023
Deep Concept Removal
Deep Concept Removal
Yegor Klochkov
Jean-François Ton
Ruocheng Guo
Yang Liu
Hang Li
48
0
0
09 Oct 2023
Entropy-MCMC: Sampling from Flat Basins with Ease
Entropy-MCMC: Sampling from Flat Basins with Ease
Bolian Li
Ruqi Zhang
63
5
0
09 Oct 2023
Why Do We Need Weight Decay in Modern Deep Learning?
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
98
38
0
06 Oct 2023
Small batch deep reinforcement learning
Small batch deep reinforcement learning
J. Obando-Ceron
Marc G. Bellemare
Pablo Samuel Castro
VLM
98
19
0
05 Oct 2023
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
Tom Sherborne
Naomi Saphra
Pradeep Dasigi
Hao Peng
46
5
0
05 Oct 2023
Neural Language Model Pruning for Automatic Speech Recognition
Neural Language Model Pruning for Automatic Speech Recognition
Leonardo Emili
Thiago Fraga-Silva
Ernest Pusateri
M. Nußbaum-Thom
Youssef Oualil
79
1
0
05 Oct 2023
Modularity in Deep Learning: A Survey
Modularity in Deep Learning: A Survey
Haozhe Sun
Isabelle Guyon
MoMe
92
3
0
02 Oct 2023
Stability and Generalization for Minibatch SGD and Local SGD
Stability and Generalization for Minibatch SGD and Local SGD
Yunwen Lei
Tao Sun
Mingrui Liu
77
4
0
02 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
73
3
0
01 Oct 2023
Sharpness-Aware Teleportation on Riemannian Manifolds
Sharpness-Aware Teleportation on Riemannian Manifolds
Kenneth Allen
Hoang Nguyen
Haocheng Luo
Ming-Jun Lai
Mehrtash Harandi
Dinh Q. Phung
T. Le
AAML
95
3
0
29 Sep 2023
Bringing the Discussion of Minima Sharpness to the Audio Domain: a
  Filter-Normalised Evaluation for Acoustic Scene Classification
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
M. Milling
Andreas Triantafyllopoulos
Iosif Tsangko
Simon Rampp
F. Schlüter
104
3
0
28 Sep 2023
Deep Model Fusion: A Survey
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedMLMoMe
113
62
0
27 Sep 2023
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Bingcong Li
G. Giannakis
AAML
112
23
0
27 Sep 2023
Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer
  ReLU Neural Networks
Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks
Yahong Yang
Qipin Chen
Wenrui Hao
51
4
0
26 Sep 2023
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization
  Method
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method
Mohannad Elhamod
Anuj Karpatne
67
2
0
26 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
  Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
75
121
0
25 Sep 2023
Revisiting LARS for Large Batch Training Generalization of Neural
  Networks
Revisiting LARS for Large Batch Training Generalization of Neural Networks
K. Do
Duong Nguyen
Hoa Nguyen
Long Tran-Thanh
Nguyen-Hoang Tran
Quoc-Viet Pham
AI4CEODL
69
1
0
25 Sep 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio
  (GSNR)
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
51
2
0
24 Sep 2023
Fantastic Generalization Measures are Nowhere to be Found
Fantastic Generalization Measures are Nowhere to be Found
Michael C. Gastpar
Ido Nachum
Jonathan Shafer
T. Weinberger
89
15
0
24 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
105
50
0
19 Sep 2023
On the different regimes of Stochastic Gradient Descent
On the different regimes of Stochastic Gradient Descent
Antonio Sclocchi
Matthieu Wyart
60
20
0
19 Sep 2023
M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient
  Object Detection
M3^33Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection
Yao Yuan
Pan Gao
Xiaoyang Tan
3DPC
91
4
0
15 Sep 2023
Gradient constrained sharpness-aware prompt learning for vision-language
  models
Gradient constrained sharpness-aware prompt learning for vision-language models
Liangchen Liu
Nannan Wang
Dawei Zhou
Xinbo Gao
Decheng Liu
Xi Yang
Tongliang Liu
VLM
68
2
0
14 Sep 2023
Do Generative Large Language Models need billions of parameters?
Do Generative Large Language Models need billions of parameters?
Sia Gholami
Marwan Omar
89
19
0
12 Sep 2023
Exploring Flat Minima for Domain Generalization with Large Learning
  Rates
Exploring Flat Minima for Domain Generalization with Large Learning Rates
Jian Zhang
Lei Qi
Yinghuan Shi
Yang Gao
76
3
0
12 Sep 2023
Split-Boost Neural Networks
Split-Boost Neural Networks
R. G. Cestari
Gabriele Maroni
Loris Cannelli
Dario Piga
Simone Formentin
30
1
0
06 Sep 2023
A Theoretical Explanation of Activation Sparsity through Flat Minima and
  Adversarial Robustness
A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness
Ze Peng
Lei Qi
Yinghuan Shi
Yang Gao
121
5
0
06 Sep 2023
Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain
  Adaptation in Neural Machine Translation
Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation
Keyu Chen
Zhuang Di
Mingchen Li
J. M. Chang
110
3
0
06 Sep 2023
Learning Driver Models for Automated Vehicles via Knowledge Sharing and
  Personalization
Learning Driver Models for Automated Vehicles via Knowledge Sharing and Personalization
Wissam Kontar
Xinzhi Zhong
Soyoung Ahn
68
0
0
31 Aug 2023
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in
  Federated Learning
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
Gihun Lee
Minchan Jeong
Sangmook Kim
Jaehoon Oh
Se-Young Yun
FedML
69
9
0
24 Aug 2023
Jumping through Local Minima: Quantization in the Loss Landscape of
  Vision Transformers
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
MQ
99
17
0
21 Aug 2023
Latent State Models of Training Dynamics
Latent State Models of Training Dynamics
Michael Y. Hu
Angelica Chen
Naomi Saphra
Kyunghyun Cho
99
8
0
18 Aug 2023
Learning Lightweight Object Detectors via Multi-Teacher Progressive
  Distillation
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
Shengcao Cao
Mengtian Li
James Hays
Deva Ramanan
Yu-Xiong Wang
Liangyan Gui
VLM
74
12
0
17 Aug 2023
Radiomics-Informed Deep Learning for Classification of Atrial
  Fibrillation Sub-Types from Left-Atrium CT Volumes
Radiomics-Informed Deep Learning for Classification of Atrial Fibrillation Sub-Types from Left-Atrium CT Volumes
Weihang Dai
Xuelong Li
Taihui Yu
Di Zhao
Jun Shen
Kwang-Ting Cheng
58
0
0
14 Aug 2023
Noise Balance and Stationary Distribution of Stochastic Gradient Descent
Noise Balance and Stationary Distribution of Stochastic Gradient Descent
Liu Ziyin
Hongchao Li
Masakuni Ueda
56
9
0
13 Aug 2023
Enhancing Generalization of Universal Adversarial Perturbation through
  Gradient Aggregation
Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation
Xuantong Liu
Yaoyao Zhong
Yuhang Zhang
Lixiong Qin
Weihong Deng
AAML
94
25
0
11 Aug 2023
Previous
123...678...303132
Next