ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Simplicity Bias via Global Convergence of Sharpness Minimization
Simplicity Bias via Global Convergence of Sharpness Minimization
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
54
1
0
21 Oct 2024
Implicit Regularization of Sharpness-Aware Minimization for
  Scale-Invariant Problems
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems
Bingcong Li
Liang Zhang
Niao He
93
8
0
18 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
84
3
0
18 Oct 2024
Transformer-Based Approaches for Sensor-Based Human Activity
  Recognition: Opportunities and Challenges
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
59
2
0
17 Oct 2024
From promise to practice: realizing high-performance decentralized
  training
From promise to practice: realizing high-performance decentralized training
Zesen Wang
Jiaojiao Zhang
Xuyang Wu
M. Johansson
105
0
0
15 Oct 2024
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Arpan Mukherjee
Shashanka Ubaru
K. Murugesan
Karthikeyan Shanmugam
A. Tajer
68
2
0
14 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language
  to Video Knowledge Transfer
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
69
1
0
14 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
123
7
0
14 Oct 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
130
1
0
14 Oct 2024
Understanding Adversarially Robust Generalization via Weight-Curvature
  Index
Understanding Adversarially Robust Generalization via Weight-Curvature Index
Yuelin Xu
Xiao Zhang
AAML
61
0
0
10 Oct 2024
OledFL: Unleashing the Potential of Decentralized Federated Learning via
  Opposite Lookahead Enhancement
OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement
Qinglun Li
Miao Zhang
Mengzhu Wang
Quanjun Yin
Li Shen
OODDFedML
59
0
0
09 Oct 2024
Extended convexity and smoothness and their applications in deep learning
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
105
0
0
08 Oct 2024
QT-DoG: Quantization-aware Training for Domain Generalization
QT-DoG: Quantization-aware Training for Domain Generalization
Saqib Javed
Hieu Le
Mathieu Salzmann
OODMQ
110
2
0
08 Oct 2024
Incremental Learning for Robot Shared Autonomy
Incremental Learning for Robot Shared Autonomy
Yiran Tao
Guixiu Qiao
Dan Ding
Zackory Erickson
CLL
97
0
0
08 Oct 2024
Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization
Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization
Guy Kornowski
Daogao Liu
Kunal Talwar
67
2
0
08 Oct 2024
Intriguing Properties of Large Language and Vision Models
Intriguing Properties of Large Language and Vision Models
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Yechan Hwang
Ho-Jin Choi
LRMVLM
110
0
0
07 Oct 2024
Improving Generalization with Flat Hilbert Bayesian Inference
Improving Generalization with Flat Hilbert Bayesian Inference
Tuan Truong
Quyen Tran
Quan Pham-Ngoc
Nhat Ho
Dinh Q. Phung
T. Le
71
1
0
05 Oct 2024
Towards Better Generalization: Weight Decay Induces Low-rank Bias for
  Neural Networks
Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks
Ke Chen
Chugang Yi
Haizhao Yang
MLT
69
0
0
03 Oct 2024
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Boqian Wu
Q. Xiao
Shunxin Wang
N. Strisciuglio
Mykola Pechenizkiy
M. V. Keulen
Decebal Constantin Mocanu
Elena Mocanu
OOD3DH
206
3
0
03 Oct 2024
Revisiting Video Quality Assessment from the Perspective of
  Generalization
Revisiting Video Quality Assessment from the Perspective of Generalization
Xinli Yue
Jianhui Sun
Liangchao Yao
Fan Xia
Yuetang Deng
...
Lei Li
Fengyun Rao
Jing Lv
Qian Wang
Lingchen Zhao
MoMe
56
0
0
23 Sep 2024
Bilateral Sharpness-Aware Minimization for Flatter Minima
Bilateral Sharpness-Aware Minimization for Flatter Minima
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Qingming Huang
AAML
449
0
0
20 Sep 2024
Hidden Activations Are Not Enough: A General Approach to Neural Network
  Predictions
Hidden Activations Are Not Enough: A General Approach to Neural Network Predictions
Samuel Leblanc
Aiky Rasolomanana
Marco Armenta
74
0
0
20 Sep 2024
Efficient Training of Deep Neural Operator Networks via Randomized Sampling
Efficient Training of Deep Neural Operator Networks via Randomized Sampling
Sharmila Karumuri
Lori Graham-Brady
Somdatta Goswami
75
2
0
20 Sep 2024
Convergence of Sharpness-Aware Minimization Algorithms using Increasing
  Batch Size and Decaying Learning Rate
Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
Hinata Harada
Hideaki Iiduka
54
1
0
16 Sep 2024
WaterMAS: Sharpness-Aware Maximization for Neural Network Watermarking
WaterMAS: Sharpness-Aware Maximization for Neural Network Watermarking
Carl De Sousa Trias
Mihai P. Mitrea
Attilio Fiandrotti
Marco Cagnazzo
Sumanta Chaudhuri
Enzo Tartaglione
AAML
50
1
0
05 Sep 2024
Improving Robustness to Multiple Spurious Correlations by
  Multi-Objective Optimization
Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization
Nayeong Kim
Juwon Kang
Sungsoo Ahn
Jungseul Ok
Suha Kwak
62
1
0
05 Sep 2024
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng
Xi Chen
Yuwen Pu
Xuhong Zhang
Tianyu Du
Shouling Ji
84
5
0
02 Sep 2024
Fisher Information guided Purification against Backdoor Attacks
Fisher Information guided Purification against Backdoor Attacks
Nazmul Karim
Abdullah Al Arafat
Adnan Siraj Rakin
Zhishan Guo
Nazanin Rahnavard
AAML
112
2
0
01 Sep 2024
Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single
  Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep
  Feed-forward Neural Network
Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network
Yijun Zhou
Om Arora-Jain
Xia Jiang
OOD
51
2
0
28 Aug 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
138
1
0
26 Aug 2024
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging
Yichu Xu
Xin-Chun Li
Le Gan
De-Chuan Zhan
MoMe
79
0
0
22 Aug 2024
A Noncontact Technique for Wave Measurement Based on Thermal
  Stereography and Deep Learning
A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning
Deyu Li
L. Xiao
Handi Wei
Yan Li
Binghua Zhang
72
0
0
20 Aug 2024
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Jiahao Chen
Zhou Feng
Rui Zeng
Yuwen Pu
Chunyi Zhou
Yi Jiang
Yuyou Gan
Jinbao Li
Shouling Ji
AAML
106
1
0
18 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent
  Phase Transition
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
79
4
0
16 Aug 2024
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster
  Scheduling
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Xinyi Zhang
Hanyu Zhao
Wencong Xiao
Xianyan Jia
Fei Xu
Yong Li
Wei Lin
Fangming Liu
46
2
0
16 Aug 2024
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Xuehao Wang
Weisen Jiang
Shuai Fu
Yu Zhang
AAML
81
0
0
15 Aug 2024
Implicit Neural Representation For Accurate CFD Flow Field Prediction
Implicit Neural Representation For Accurate CFD Flow Field Prediction
L. D. Vito
Nils Pinnau
Simone Dey
AI4CE
86
1
0
12 Aug 2024
Do Sharpness-based Optimizers Improve Generalization in Medical Image
  Analysis?
Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?
Mohamed Hassan
Aleksandar Vakanski
Min Xian
AAMLMedIm
89
1
0
07 Aug 2024
Exploring Loss Landscapes through the Lens of Spin Glass Theory
Exploring Loss Landscapes through the Lens of Spin Glass Theory
Hao Liao
Wei Zhang
Zhanyi Huang
Zexiao Long
Mingyang Zhou
Xiaoqun Wu
Rui Mao
Chi Ho Yeung
79
2
0
30 Jul 2024
Characterizing Dynamical Stability of Stochastic Gradient Descent in
  Overparameterized Learning
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Dennis Chemnitz
Maximilian Engel
68
0
0
29 Jul 2024
Local vs Global continual learning
Local vs Global continual learning
Giulia Lanzillotta
Sidak Pal Singh
Benjamin Grewe
Thomas Hofmann
CLL
68
0
0
23 Jul 2024
Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance
Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance
Haiquan Lu
Xiaotian Liu
Yefan Zhou
Qunli Li
Kurt Keutzer
Michael W. Mahoney
Yujun Yan
Huanrui Yang
Yaoqing Yang
56
1
0
17 Jul 2024
Overcoming Catastrophic Forgetting in Federated Class-Incremental
  Learning via Federated Global Twin Generator
Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator
Thinh Nguyen
Khoa D. Doan
Binh T. Nguyen
Danh Le-Phuoc
Kok-Seng Wong
FedML
70
0
0
13 Jul 2024
Harmony in Diversity: Merging Neural Networks with Canonical Correlation
  Analysis
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Eugene Belilovsky
Guy Wolf
FedMLMoMe
51
10
0
07 Jul 2024
Multimodal Classification via Modal-Aware Interactive Enhancement
Multimodal Classification via Modal-Aware Interactive Enhancement
Qing-Yuan Jiang
Zhouyang Chi
Yang Yang
63
3
0
05 Jul 2024
Simplifying Deep Temporal Difference Learning
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
161
26
0
05 Jul 2024
PaSE: Parallelization Strategies for Efficient DNN Training
PaSE: Parallelization Strategies for Efficient DNN Training
Venmugil Elango
40
9
0
04 Jul 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
65
0
0
04 Jul 2024
Curvature Clues: Decoding Deep Learning Privacy with Input Loss
  Curvature
Curvature Clues: Decoding Deep Learning Privacy with Input Loss Curvature
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
70
0
0
03 Jul 2024
Enhancing Accuracy and Parameter-Efficiency of Neural Representations
  for Network Parameterization
Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization
Hongjun Choi
Jayaraman J. Thiagarajan
Ruben Glatt
Shusen Liu
87
0
0
29 Jun 2024
Previous
123456...303132
Next