v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 126 55 0 25 Oct 2022
Deep Neural Networks as the Semi-classical Limit of Topological Quantum Neural Networks: The problem of generalisation A. Marcianò De-Wei Chen Filippo Fabrocini C. Fields M. Lulli Emanuele Zappala GNN 29 5 0 25 Oct 2022
Sufficient Invariant Learning for Distribution Shift Taero Kim Sungjun Lim Kyungwoo Song OOD 66 2 0 24 Oct 2022
K-SAM: Sharpness-Aware Minimization at the Speed of SGD Renkun Ni Ping Yeh-Chiang Jonas Geiping Micah Goldblum A. Wilson Tom Goldstein 64 9 0 23 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes O. Oyedotun Konstantinos Papadopoulos Djamila Aouada AI4CE 73 12 0 21 Oct 2022
Large-batch Optimization for Dense Visual Predictions Zeyue Xue Jianming Liang Guanglu Song Zhuofan Zong Liang Chen Yu Liu Ping Luo VLM 96 9 0 20 Oct 2022
Motion correction in MRI using deep learning and a novel hybrid loss function Lei Zhang Xiaoke Wang Michael Rawson R. Balan E. Herskovits E. Melhem Linda Chang Ze Wang T. Ernst MedIm 84 13 0 19 Oct 2022
Rethinking Sharpness-Aware Minimization as Variational Inference Szilvia Ujváry Zsigmond Telek A. Kerekes Anna Mészáros Ferenc Huszár 63 8 0 19 Oct 2022
Vision Transformers provably learn spatial structure Samy Jelassi Michael E. Sander Yuan-Fang Li ViT MLT 100 83 0 13 Oct 2022
SQuAT: Sharpness- and Quantization-Aware Training for BERT Zheng Wang Juncheng Billy Li Shuhui Qu Florian Metze Emma Strubell MQ 42 7 0 13 Oct 2022
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization Zhiyuan Zhang Ruixuan Luo Qi Su Xueting Sun 105 13 0 13 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks A. K. Akash Sixu Li Nicolas García Trillos 71 13 0 13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities Brian Bartoldson B. Kailkhura Davis W. Blalock 107 51 0 13 Oct 2022
On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning Lorenzo Bonicelli Matteo Boschini Angelo Porrello C. Spampinato Simone Calderara CLL 72 48 0 12 Oct 2022
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models Qihuang Zhong Liang Ding Li Shen Peng Mi Juhua Liu Bo Du Dacheng Tao AAML 90 51 0 11 Oct 2022
Stable and Efficient Adversarial Training through Local Linearization Zhuorong Li Daiwei Yu AAML 32 0 0 11 Oct 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 136 60 0 11 Oct 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach Peng Mi Li Shen Tianhe Ren Yiyi Zhou Xiaoshuai Sun Rongrong Ji Dacheng Tao AAML 116 71 0 11 Oct 2022
TAN Without a Burn: Scaling Laws of DP-SGD Tom Sander Pierre Stock Alexandre Sablayrolles FedML 86 43 0 07 Oct 2022
Invariant Aggregator for Defending against Federated Backdoor Attacks Xiaoya Wang Dimitrios Dimitriadis Oluwasanmi Koyejo Shruti Tople FedML 89 1 0 04 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals Rohin Shah Vikrant Varma Ramana Kumar Mary Phuong Victoria Krakovna J. Uesato Zachary Kenton 92 72 0 04 Oct 2022
MEDFAIR: Benchmarking Fairness for Medical Imaging Yongshuo Zong Yongxin Yang Timothy M. Hospedales OOD 173 65 0 04 Oct 2022
TripleE: Easy Domain Generalization via Episodic Replay Xuelong Li Hongyu Ren Huifeng Yao Ziwei Liu 26 0 0 04 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima Peter L. Bartlett Philip M. Long Olivier Bousquet 162 37 0 04 Oct 2022
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning Pengfei Zheng Rui Pan Tarannum Khan Shivaram Venkataraman Aditya Akella 88 22 0 30 Sep 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability Alexandru Damian Eshaan Nichani Jason D. Lee 107 88 0 30 Sep 2022
Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel Sungyub Kim Si-hun Park Kyungsu Kim Eunho Yang BDL 79 5 0 30 Sep 2022
Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization Danni Peng Sinno Jialin Pan 64 3 0 29 Sep 2022
Label driven Knowledge Distillation for Federated Learning with non-IID Data Minh-Duong Nguyen Quoc-Viet Pham D. Hoang Long Tran-Thanh Diep N. Nguyen Won Joo Hwang 69 2 0 29 Sep 2022
Exploring the Relationship between Architecture and Adversarially Robust Generalization Aishan Liu Shiyu Tang Siyuan Liang Ruihao Gong Boxi Wu Xianglong Liu Dacheng Tao AAML 93 19 0 28 Sep 2022
A micromechanics-based recurrent neural networks model for path-dependent cyclic deformation of short fiber composites J. Friemann B. Dashtbozorg Mikael Fagerström S. Mirkhalaf AI4CE 73 19 0 27 Sep 2022
Why neural networks find simple solutions: the many regularizers of geometric complexity Benoit Dherin Michael Munn M. Rosca David Barrett 133 31 0 27 Sep 2022
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization Gábor Melis MoMe 93 1 0 26 Sep 2022
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases James Harrison Luke Metz Jascha Narain Sohl-Dickstein 112 21 0 22 Sep 2022
Deep Double Descent via Smooth Interpolation Matteo Gamba Erik Englesson Mårten Björkman Hossein Azizpour 169 11 0 21 Sep 2022
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning Christian Raymond Qi Chen Bing Xue Mengjie Zhang FedML 83 13 0 19 Sep 2022
Is Stochastic Gradient Descent Near Optimal? Yifan Zhu Hong Jun Jeon Benjamin Van Roy 69 2 0 18 Sep 2022
Towards Bridging the Performance Gaps of Joint Energy-based Models Xiulong Yang Qing Su Shihao Ji VLM 65 15 0 16 Sep 2022
Losing momentum in continuous-time stochastic optimisation Kexin Jin J. Latz Chenguang Liu Alessandro Scagliotti 52 2 0 08 Sep 2022
Information Maximization for Extreme Pose Face Recognition Mohammad Saeed Ebrahimi Saadabadi Sahar Rahimi Malakshan Sobhan Soleymani Moktari Mostofa Nasser M. Nasrabadi CVBM 59 5 0 07 Sep 2022
Generalisation under gradient descent via deterministic PAC-Bayes Eugenio Clerico Tyler Farghly George Deligiannidis Benjamin Guedj Arnaud Doucet 152 4 0 06 Sep 2022
Investigating the Impact of Model Misspecification in Neural Simulation-based Inference Patrick W Cannon Daniel Ward Sebastian M. Schmon 78 36 0 05 Sep 2022
Super-model ecosystem: A domain-adaptation perspective Fengxiang He Dacheng Tao DiffM 84 1 0 30 Aug 2022
Visualizing high-dimensional loss landscapes with Hessian directions Lucas Böttcher Gregory R. Wheeler 79 14 0 28 Aug 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 91 81 0 26 Aug 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification Yunqing Zhao Ngai-Man Cheung BDL 65 13 0 23 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective Chanwoo Park Sangdoo Yun Sanghyuk Chun AAML 83 32 0 21 Aug 2022
Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification Quanshi Zhang Xu Cheng Yilan Chen Zhefan Rao 56 36 0 18 Aug 2022
Object Detection for Autonomous Dozers Chunfang Liu Burhaneddin Yaman 68 2 0 17 Aug 2022
On the generalization of learning algorithms that do not converge N. Chandramoorthy Andreas Loukas Khashayar Gatmiry Stefanie Jegelka MLT 91 11 0 16 Aug 2022