Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1705.08741
Cited By
v1
v2 (latest)
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Train longer, generalize better: closing the generalization gap in large batch training of neural networks"
50 / 465 papers shown
Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Jing Hou
Guang Chen
Ruiqi Zhang
Zhijun Li
Shangding Gu
Changjun Jiang
OffRL
164
3
0
11 Dec 2023
BCN: Batch Channel Normalization for Image Classification
Afifa Khaled
Chao Li
Jia Ning
Kun He
159
14
0
01 Dec 2023
LEOD: Label-Efficient Object Detection for Event Cameras
Computer Vision and Pattern Recognition (CVPR), 2023
Ziyi Wu
Mathias Gehrig
Qing Lyu
Xudong Liu
Igor Gilitschenski
258
30
0
29 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
384
4
0
15 Nov 2023
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective
Neural Information Processing Systems (NeurIPS), 2023
Yifei Wang
Liangchen Li
Jiansheng Yang
Zhouchen Lin
Yisen Wang
282
19
0
30 Oct 2023
rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity Recognition
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023
Yu Enokibori
194
4
0
30 Oct 2023
Stable and Interpretable Deep Learning for Tabular Data: Introducing InterpreTabNet with the Novel InterpreStability Metric
Shiyun Wa
Xinai Lu
Minjuan Wang
174
1
0
04 Oct 2023
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs
International Conference on Compiler Construction (CC), 2023
Cyrus Zhou
Zack Hassman
Ruize Xu
Dhirpal Shah
Vaughn Richard
Yanjing Li
501
5
0
01 Oct 2023
Masked Autoencoders are Scalable Learners of Cellular Morphology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton Earnshaw
275
19
0
27 Sep 2023
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedML
MoMe
306
89
0
27 Sep 2023
Revisiting LARS for Large Batch Training Generalization of Neural Networks
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
K. Do
Duong Nguyen
Hoa Nguyen
Long Tran-Thanh
Nguyen-Hoang Tran
Quoc-Viet Pham
AI4CE
ODL
354
6
0
25 Sep 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
221
2
0
24 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
442
71
0
19 Sep 2023
On the different regimes of Stochastic Gradient Descent
Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2023
Antonio Sclocchi
Matthieu Wyart
387
32
0
19 Sep 2023
Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data
Journal of machine learning research (JMLR), 2023
Wanli Hong
Shuyang Ling
223
31
0
18 Sep 2023
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
Stavroula Mougiakakou
208
5
0
04 Sep 2023
On the Implicit Bias of Adam
International Conference on Machine Learning (ICML), 2023
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
463
23
0
31 Aug 2023
FwdLLM: Efficient FedLLM using Forward Gradient
Mengwei Xu
Dongqi Cai
Yaozong Wu
Xiang Li
Shangguang Wang
FedML
255
34
0
26 Aug 2023
Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation
IEEE International Conference on Computer Vision (ICCV), 2023
Xuantong Liu
Yaoyao Zhong
Yuhang Zhang
Lixiong Qin
Weihong Deng
AAML
296
37
0
11 Aug 2023
G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
Xingyu Li
Bo Tang
AAML
214
1
0
07 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
155
0
0
05 Aug 2023
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
Shaoshuai Shi
Yue Liu
218
1
0
04 Aug 2023
GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning
European Conference on Artificial Intelligence (ECAI), 2023
Soumyadeep Roy
Jonas Wallat
Sowmya S. Sundaram
Wolfgang Nejdl
Niloy Ganguly
200
3
0
29 Jul 2023
The instabilities of large learning rate training: a loss landscape view
Lawrence Wang
Stephen J. Roberts
155
3
0
22 Jul 2023
Addressing caveats of neural persistence with deep graph persistence
Leander Girrbach
Anders Christensen
Ole Winther
Zeynep Akata
A. Sophia Koepke
GNN
388
1
0
20 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
IEEE International Conference on Cluster Computing (CLUSTER), 2023
S. Tyagi
Martin Swany
FedML
309
7
0
16 Jul 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yang Luo
Xiaozhe Ren
Zangwei Zheng
Zhuo Jiang
Xin Jiang
Yang You
ODL
345
35
0
05 Jul 2023
A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks
Neural Information Processing Systems (NeurIPS), 2023
Vignesh Kothapalli
Tom Tirer
Joan Bruna
267
17
0
04 Jul 2023
Black holes and the loss landscape in machine learning
Journal of High Energy Physics (JHEP), 2023
P. Kumar
Taniya Mandal
Swapnamay Mondal
202
2
0
26 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Neural Information Processing Systems (NeurIPS), 2023
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
374
54
0
23 Jun 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Neural Information Processing Systems (NeurIPS), 2023
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
356
4
0
18 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
International Conference on Machine Learning (ICML), 2023
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
271
7
0
14 Jun 2023
Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
Shahar Stein Ioushua
Inbar Hasidim
O. Shayevitz
M. Feder
273
1
0
14 Jun 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates
ACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2023
Efstathia Soufleri
Gang Yan
Maroun Touma
Jian Li
248
7
0
11 Jun 2023
Anti-Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances in Flat Directions
Marcel Kühn
B. Rosenow
364
5
0
08 Jun 2023
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Neural Information Processing Systems (NeurIPS), 2023
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
283
32
0
07 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
International Conference on Machine Learning (ICML), 2023
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Weilong Dai
Dacheng Tao
652
19
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Wanrong Zhu
263
29
0
04 Jun 2023
Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution
Jinwuk Seok
Chang-Jae Cho
283
0
0
30 May 2023
SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters
Lawrence Wang
Stephen J. Roberts
242
0
0
29 May 2023
Ghost Noise for Regularizing Deep Neural Networks
AAAI Conference on Artificial Intelligence (AAAI), 2023
Atli Kosson
Dongyang Fan
Martin Jaggi
302
2
0
26 May 2023
Batch Model Consolidation: A Multi-Task Model Consolidation Framework
Computer Vision and Pattern Recognition (CVPR), 2023
Iordanis Fostiropoulos
Jiaye Zhu
Laurent Itti
MoMe
CLL
170
3
0
25 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
261
1
0
23 May 2023
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy
R. Guerraoui
Ljiljana Dolamic
228
1
0
20 May 2023
GeNAS: Neural Architecture Search with Better Generalization
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
197
4
0
15 May 2023
Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training
International Conference on Medical Imaging with Deep Learning (MIDL), 2023
Kudaibergen Abutalip
Numan Saeed
Mustaqeem Khan
Abdulmotaleb El Saddik
144
2
0
22 Apr 2023
A Neural Network Transformer Model for Composite Microstructure Homogenization
Engineering applications of artificial intelligence (Eng. Appl. Artif. Intell.), 2023
Emil Pitz
K. Pochiraju
AI4CE
283
16
0
16 Apr 2023
Deep neural networks have an inbuilt Occam's razor
Nature Communications (Nat. Commun.), 2023
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
293
16
0
13 Apr 2023
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
Neural Information Processing Systems (NeurIPS), 2023
Kfir Y. Levy
Kfir Y. Levy
FedML
253
4
0
09 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
296
51
0
07 Apr 2023
Previous
1
2
3
4
5
...
8
9
10
Next
Page 2 of 10