ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08741
  4. Cited By
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
    ODL
ArXiv (abs)PDFHTML

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown
Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Jing Hou
Guang Chen
Ruiqi Zhang
Zhijun Li
Shangding Gu
Changjun Jiang
OffRL
164
3
0
11 Dec 2023
BCN: Batch Channel Normalization for Image Classification
BCN: Batch Channel Normalization for Image Classification
Afifa Khaled
Chao Li
Jia Ning
Kun He
159
14
0
01 Dec 2023
LEOD: Label-Efficient Object Detection for Event Cameras
LEOD: Label-Efficient Object Detection for Event CamerasComputer Vision and Pattern Recognition (CVPR), 2023
Ziyi Wu
Mathias Gehrig
Qing Lyu
Xudong Liu
Igor Gilitschenski
258
30
0
29 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions:
  Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
384
4
0
15 Nov 2023
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from
  a Minimax Game Perspective
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game PerspectiveNeural Information Processing Systems (NeurIPS), 2023
Yifei Wang
Liangchen Li
Jiansheng Yang
Zhouchen Lin
Yisen Wang
282
19
0
30 Oct 2023
rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature
  Extraction for IMU-based Human Activity Recognition
rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity RecognitionProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023
Yu Enokibori
194
4
0
30 Oct 2023
Stable and Interpretable Deep Learning for Tabular Data: Introducing
  InterpreTabNet with the Novel InterpreStability Metric
Stable and Interpretable Deep Learning for Tabular Data: Introducing InterpreTabNet with the Novel InterpreStability Metric
Shiyun Wa
Xinai Lu
Minjuan Wang
174
1
0
04 Oct 2023
YFlows: Systematic Dataflow Exploration and Code Generation for
  Efficient Neural Network Inference using SIMD Architectures on CPUs
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUsInternational Conference on Compiler Construction (CC), 2023
Cyrus Zhou
Zack Hassman
Ruize Xu
Dhirpal Shah
Vaughn Richard
Yanjing Li
501
5
0
01 Oct 2023
Masked Autoencoders are Scalable Learners of Cellular Morphology
Masked Autoencoders are Scalable Learners of Cellular Morphology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton Earnshaw
275
19
0
27 Sep 2023
Deep Model Fusion: A Survey
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedMLMoMe
306
89
0
27 Sep 2023
Revisiting LARS for Large Batch Training Generalization of Neural
  Networks
Revisiting LARS for Large Batch Training Generalization of Neural NetworksIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
K. Do
Duong Nguyen
Hoa Nguyen
Long Tran-Thanh
Nguyen-Hoang Tran
Quoc-Viet Pham
AI4CEODL
354
6
0
25 Sep 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio
  (GSNR)
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
221
2
0
24 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
442
71
0
19 Sep 2023
On the different regimes of Stochastic Gradient Descent
On the different regimes of Stochastic Gradient DescentProceedings of the National Academy of Sciences of the United States of America (PNAS), 2023
Antonio Sclocchi
Matthieu Wyart
387
32
0
19 Sep 2023
Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss
  with Imbalanced Data
Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced DataJournal of machine learning research (JMLR), 2023
Wanli Hong
Shuyang Ling
223
31
0
18 Sep 2023
No Data Augmentation? Alternative Regularizations for Effective Training
  on Small Datasets
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
Stavroula Mougiakakou
208
5
0
04 Sep 2023
On the Implicit Bias of Adam
On the Implicit Bias of AdamInternational Conference on Machine Learning (ICML), 2023
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
463
23
0
31 Aug 2023
FwdLLM: Efficient FedLLM using Forward Gradient
FwdLLM: Efficient FedLLM using Forward Gradient
Mengwei Xu
Dongqi Cai
Yaozong Wu
Xiang Li
Shangguang Wang
FedML
255
34
0
26 Aug 2023
Enhancing Generalization of Universal Adversarial Perturbation through
  Gradient Aggregation
Enhancing Generalization of Universal Adversarial Perturbation through Gradient AggregationIEEE International Conference on Computer Vision (ICCV), 2023
Xuantong Liu
Yaoyao Zhong
Yuhang Zhang
Lixiong Qin
Weihong Deng
AAML
296
37
0
11 Aug 2023
G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
G-Mix: A Generalized Mixup Learning Framework Towards Flat MinimaIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
Xingyu Li
Bo Tang
AAML
214
1
0
07 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
  Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
155
0
0
05 Aug 2023
Eva: A General Vectorized Approximation Framework for Second-order
  Optimization
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
Shaoshuai Shi
Yue Liu
218
1
0
04 Aug 2023
GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning
GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot LearningEuropean Conference on Artificial Intelligence (ECAI), 2023
Soumyadeep Roy
Jonas Wallat
Sowmya S. Sundaram
Wolfgang Nejdl
Niloy Ganguly
200
3
0
29 Jul 2023
The instabilities of large learning rate training: a loss landscape view
The instabilities of large learning rate training: a loss landscape view
Lawrence Wang
Stephen J. Roberts
155
3
0
22 Jul 2023
Addressing caveats of neural persistence with deep graph persistence
Addressing caveats of neural persistence with deep graph persistence
Leander Girrbach
Anders Christensen
Ole Winther
Zeynep Akata
A. Sophia Koepke
GNN
388
1
0
20 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
Accelerating Distributed ML Training via Selective SynchronizationIEEE International Conference on Cluster Computing (CLUSTER), 2023
S. Tyagi
Martin Swany
FedML
309
7
0
16 Jul 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization
CAME: Confidence-guided Adaptive Memory Efficient OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yang Luo
Xiaozhe Ren
Zangwei Zheng
Zhuo Jiang
Xin Jiang
Yang You
ODL
345
35
0
05 Jul 2023
A Neural Collapse Perspective on Feature Evolution in Graph Neural
  Networks
A Neural Collapse Perspective on Feature Evolution in Graph Neural NetworksNeural Information Processing Systems (NeurIPS), 2023
Vignesh Kothapalli
Tom Tirer
Joan Bruna
267
17
0
04 Jul 2023
Black holes and the loss landscape in machine learning
Black holes and the loss landscape in machine learningJournal of High Energy Physics (JHEP), 2023
P. Kumar
Taniya Mandal
Swapnamay Mondal
202
2
0
26 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Scaling MLPs: A Tale of Inductive BiasNeural Information Processing Systems (NeurIPS), 2023
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
374
54
0
23 Jun 2023
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reductionNeural Information Processing Systems (NeurIPS), 2023
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
356
4
0
18 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Beyond Implicit Bias: The Insignificance of SGD Noise in Online LearningInternational Conference on Machine Learning (ICML), 2023
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
271
7
0
14 Jun 2023
Batches Stabilize the Minimum Norm Risk in High Dimensional
  Overparameterized Linear Regression
Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
Shahar Stein Ioushua
Inbar Hasidim
O. Shayevitz
M. Feder
273
1
0
14 Jun 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous
  Updates
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous UpdatesACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2023
Efstathia Soufleri
Gang Yan
Maroun Touma
Jian Li
248
7
0
11 Jun 2023
Anti-Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances in Flat Directions
Anti-Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances in Flat Directions
Marcel Kühn
B. Rosenow
364
5
0
08 Jun 2023
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Normalization Layers Are All That Sharpness-Aware Minimization NeedsNeural Information Processing Systems (NeurIPS), 2023
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
283
32
0
07 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically EquivalentInternational Conference on Machine Learning (ICML), 2023
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Weilong Dai
Dacheng Tao
652
19
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between
  Creativity and Reality in Large Language Models
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Wanrong Zhu
263
29
0
04 Jun 2023
Stochastic Gradient Langevin Dynamics Based on Quantization with
  Increasing Resolution
Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution
Jinwuk Seok
Chang-Jae Cho
283
0
0
30 May 2023
SANE: The phases of gradient descent through Sharpness Adjusted Number
  of Effective parameters
SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters
Lawrence Wang
Stephen J. Roberts
242
0
0
29 May 2023
Ghost Noise for Regularizing Deep Neural Networks
Ghost Noise for Regularizing Deep Neural NetworksAAAI Conference on Artificial Intelligence (AAAI), 2023
Atli Kosson
Dongyang Fan
Martin Jaggi
302
2
0
26 May 2023
Batch Model Consolidation: A Multi-Task Model Consolidation Framework
Batch Model Consolidation: A Multi-Task Model Consolidation FrameworkComputer Vision and Pattern Recognition (CVPR), 2023
Iordanis Fostiropoulos
Jiaye Zhu
Laurent Itti
MoMeCLL
170
3
0
25 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedMLAAML
261
1
0
23 May 2023
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima
  Flatness, and Transfer Learning
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy
R. Guerraoui
Ljiljana Dolamic
228
1
0
20 May 2023
GeNAS: Neural Architecture Search with Better Generalization
GeNAS: Neural Architecture Search with Better GeneralizationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
197
4
0
15 May 2023
Improving Stain Invariance of CNNs for Segmentation by Fusing Channel
  Attention and Domain-Adversarial Training
Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial TrainingInternational Conference on Medical Imaging with Deep Learning (MIDL), 2023
Kudaibergen Abutalip
Numan Saeed
Mustaqeem Khan
Abdulmotaleb El Saddik
144
2
0
22 Apr 2023
A Neural Network Transformer Model for Composite Microstructure
  Homogenization
A Neural Network Transformer Model for Composite Microstructure HomogenizationEngineering applications of artificial intelligence (Eng. Appl. Artif. Intell.), 2023
Emil Pitz
K. Pochiraju
AI4CE
283
16
0
16 Apr 2023
Deep neural networks have an inbuilt Occam's razor
Deep neural networks have an inbuilt Occam's razorNature Communications (Nat. Commun.), 2023
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCVBDL
293
16
0
13 Apr 2023
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex OptimizationNeural Information Processing Systems (NeurIPS), 2023
Kfir Y. Levy
Kfir Y. Levy
FedML
253
4
0
09 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
296
51
0
07 Apr 2023
Previous
12345...8910
Next
Page 2 of 10