Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment
Matt Gorbett
Hossein Shirazi
Indrakshi Ray
FedML
74
2
0
08 Nov 2023
EControl: Fast Distributed Optimization with Compression and Error Control
Yuan Gao
Rustem Islamov
Sebastian U. Stich
81
8
0
06 Nov 2023
The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning
Artyom Gadetsky
Maria Brbić
69
7
0
06 Nov 2023
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
92
0
0
06 Nov 2023
Generalization Bounds for Label Noise Stochastic Gradient Descent
Jung Eun Huh
Patrick Rebeschini
61
1
0
01 Nov 2023
Density Estimation for Entry Guidance Problems using Deep Learning
Jens A. Rataczak
Davide Amato
Jay W. McMahon
29
2
0
30 Oct 2023
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Damien Ferbach
Baptiste Goujaud
Gauthier Gidel
Aymeric Dieuleveut
MoMe
127
16
0
29 Oct 2023
Implicit Regularization in Over-Parameterized Support Vector Machine
Yang Sui
Xin He
Yang Bai
38
0
0
26 Oct 2023
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning
Zhuo Huang
Li Shen
Jun-chen Yu
Bo Han
Tongliang Liu
FedML
104
23
0
25 Oct 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
74
1
0
22 Oct 2023
Cooperative Minibatching in Graph Neural Networks
M. F. Balin
Dominique LaSalle
Ümit V. Çatalyürek
GNN
60
1
0
19 Oct 2023
Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters
Gyuseong Lee
Wooseok Jang
Jin Hyeon Kim
Jaewoo Jung
Seungryong Kim
MoE
OOD
67
4
0
17 Oct 2023
"Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection
Yi Xiao
Harshit Sharma
Zhongyang Zhang
D. Bergen-Cico
Tauhidur Rahman
Asif Salekin
58
3
0
15 Oct 2023
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning
Yihua Zhang
Yimeng Zhang
Aochuan Chen
Jinghan Jia
Jiancheng Liu
Gaowen Liu
Min-Fong Hong
Shiyu Chang
Sijia Liu
AAML
104
9
0
13 Oct 2023
Deep Concept Removal
Yegor Klochkov
Jean-François Ton
Ruocheng Guo
Yang Liu
Hang Li
48
0
0
09 Oct 2023
Entropy-MCMC: Sampling from Flat Basins with Ease
Bolian Li
Ruqi Zhang
63
5
0
09 Oct 2023
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
98
38
0
06 Oct 2023
Small batch deep reinforcement learning
J. Obando-Ceron
Marc G. Bellemare
Pablo Samuel Castro
VLM
98
19
0
05 Oct 2023
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
Tom Sherborne
Naomi Saphra
Pradeep Dasigi
Hao Peng
46
5
0
05 Oct 2023
Neural Language Model Pruning for Automatic Speech Recognition
Leonardo Emili
Thiago Fraga-Silva
Ernest Pusateri
M. Nußbaum-Thom
Youssef Oualil
79
1
0
05 Oct 2023
Modularity in Deep Learning: A Survey
Haozhe Sun
Isabelle Guyon
MoMe
92
3
0
02 Oct 2023
Stability and Generalization for Minibatch SGD and Local SGD
Yunwen Lei
Tao Sun
Mingrui Liu
77
4
0
02 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
73
3
0
01 Oct 2023
Sharpness-Aware Teleportation on Riemannian Manifolds
Kenneth Allen
Hoang Nguyen
Haocheng Luo
Ming-Jun Lai
Mehrtash Harandi
Dinh Q. Phung
T. Le
AAML
95
3
0
29 Sep 2023
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
M. Milling
Andreas Triantafyllopoulos
Iosif Tsangko
Simon Rampp
F. Schlüter
104
3
0
28 Sep 2023
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedML
MoMe
113
62
0
27 Sep 2023
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Bingcong Li
G. Giannakis
AAML
112
23
0
27 Sep 2023
Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks
Yahong Yang
Qipin Chen
Wenrui Hao
51
4
0
26 Sep 2023
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method
Mohannad Elhamod
Anuj Karpatne
67
2
0
26 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
75
121
0
25 Sep 2023
Revisiting LARS for Large Batch Training Generalization of Neural Networks
K. Do
Duong Nguyen
Hoa Nguyen
Long Tran-Thanh
Nguyen-Hoang Tran
Quoc-Viet Pham
AI4CE
ODL
69
1
0
25 Sep 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
51
2
0
24 Sep 2023
Fantastic Generalization Measures are Nowhere to be Found
Michael C. Gastpar
Ido Nachum
Jonathan Shafer
T. Weinberger
89
15
0
24 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
105
50
0
19 Sep 2023
On the different regimes of Stochastic Gradient Descent
Antonio Sclocchi
Matthieu Wyart
60
20
0
19 Sep 2023
M
3
^3
3
Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection
Yao Yuan
Pan Gao
Xiaoyang Tan
3DPC
91
4
0
15 Sep 2023
Gradient constrained sharpness-aware prompt learning for vision-language models
Liangchen Liu
Nannan Wang
Dawei Zhou
Xinbo Gao
Decheng Liu
Xi Yang
Tongliang Liu
VLM
68
2
0
14 Sep 2023
Do Generative Large Language Models need billions of parameters?
Sia Gholami
Marwan Omar
89
19
0
12 Sep 2023
Exploring Flat Minima for Domain Generalization with Large Learning Rates
Jian Zhang
Lei Qi
Yinghuan Shi
Yang Gao
76
3
0
12 Sep 2023
Split-Boost Neural Networks
R. G. Cestari
Gabriele Maroni
Loris Cannelli
Dario Piga
Simone Formentin
30
1
0
06 Sep 2023
A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness
Ze Peng
Lei Qi
Yinghuan Shi
Yang Gao
121
5
0
06 Sep 2023
Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation
Keyu Chen
Zhuang Di
Mingchen Li
J. M. Chang
110
3
0
06 Sep 2023
Learning Driver Models for Automated Vehicles via Knowledge Sharing and Personalization
Wissam Kontar
Xinzhi Zhong
Soyoung Ahn
68
0
0
31 Aug 2023
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
Gihun Lee
Minchan Jeong
Sangmook Kim
Jaehoon Oh
Se-Young Yun
FedML
69
9
0
24 Aug 2023
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
MQ
99
17
0
21 Aug 2023
Latent State Models of Training Dynamics
Michael Y. Hu
Angelica Chen
Naomi Saphra
Kyunghyun Cho
99
8
0
18 Aug 2023
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
Shengcao Cao
Mengtian Li
James Hays
Deva Ramanan
Yu-Xiong Wang
Liangyan Gui
VLM
74
12
0
17 Aug 2023
Radiomics-Informed Deep Learning for Classification of Atrial Fibrillation Sub-Types from Left-Atrium CT Volumes
Weihang Dai
Xuelong Li
Taihui Yu
Di Zhao
Jun Shen
Kwang-Ting Cheng
58
0
0
14 Aug 2023
Noise Balance and Stationary Distribution of Stochastic Gradient Descent
Liu Ziyin
Hongchao Li
Masakuni Ueda
56
9
0
13 Aug 2023
Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation
Xuantong Liu
Yaoyao Zhong
Yuhang Zhang
Lixiong Qin
Weihong Deng
AAML
94
25
0
11 Aug 2023
Previous
1
2
3
...
6
7
8
...
30
31
32
Next