v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
Communication-efficient Decentralized Machine Learning over Heterogeneous Networks Pan Zhou Qian Lin Dumitrel Loghin Beng Chin Ooi Yuncheng Wu Hongfang Yu 80 37 0 12 Sep 2020
Achieving Adversarial Robustness via Sparsity Shu-Fan Wang Ningyi Liao Liyao Xiang Nanyang Ye Quanshi Zhang AAML 58 16 0 11 Sep 2020
Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism L. McClenny U. Braga-Neto PINN 96 464 0 07 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima Wonyong Sung Iksoo Choi Jinhwan Park Seokhyun Choi Sungho Shin ODL 58 7 0 05 Sep 2020
Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance A. Lohn 51 13 0 02 Sep 2020
Extreme Memorization via Scale of Initialization Harsh Mehta Ashok Cutkosky Behnam Neyshabur 60 20 0 31 Aug 2020
Predicting Training Time Without Training Luca Zancato Alessandro Achille Avinash Ravichandran Rahul Bhotika Stefano Soatto 156 24 0 28 Aug 2020
Adversarially Robust Learning via Entropic Regularization Gauri Jagatap Ameya Joshi A. B. Chowdhury S. Garg Chinmay Hegde OOD 123 11 0 27 Aug 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning Aurick Qiao Sang Keun Choe Suhas Jayaram Subramanya Willie Neiswanger Qirong Ho Hao Zhang G. Ganger Eric Xing VLM 77 183 0 27 Aug 2020
Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra Vardan Papyan 64 80 0 27 Aug 2020
What is being transferred in transfer learning? Behnam Neyshabur Hanie Sedghi Chiyuan Zhang 141 530 0 26 Aug 2020
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks Sam Verboven M. H. Chaudhary Jeroen Berrevoets Wouter Verbeke 52 7 0 26 Aug 2020
Noise-induced degeneration in online learning Yuzuru Sato Daiji Tsutsui A. Fujiwara 42 2 0 24 Aug 2020
XNAP: Making LSTM-based Next Activity Predictions Explainable by Using LRP Sven Weinzierl Sandra Zilker Jens Brunk K. Revoredo Martin Matzner J. Becker 62 27 0 18 Aug 2020
Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks Elahe Arani F. Sarfraz Bahram Zonooz AAML 60 9 0 16 Aug 2020
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training Geoffrey X. Yu Tovi Grossman Gennady Pekhimenko 41 17 0 15 Aug 2020
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition Y. Kim Wonpyo Park Jongju Shin CVBM 139 51 0 15 Aug 2020
Optimizing Information Loss Towards Robust Neural Networks Philip Sperl Konstantin Böttinger AAML 45 3 0 07 Aug 2020
Neural Complexity Measures Yoonho Lee Juho Lee Sung Ju Hwang Eunho Yang Seungjin Choi 85 9 0 07 Aug 2020
Communication-Efficient and Distributed Learning Over Wireless Networks: Principles and Applications Jihong Park S. Samarakoon Anis Elgabli Joongheon Kim M. Bennis Seong-Lyun Kim Mérouane Debbah 102 164 0 06 Aug 2020
Wasserstein-based Projections with Applications to Inverse Problems Howard Heaton Samy Wu Fung A. Lin Stanley Osher W. Yin 58 3 0 05 Aug 2020
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry Yossi Arjevani M. Field 55 16 0 04 Aug 2020
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks Jun Shu Yanwen Zhu Qian Zhao Zongben Xu Deyu Meng 75 7 0 29 Jul 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training Shen-Yi Zhao Chang-Wei Shi Yin-Peng Xie Wu-Jun Li ODL 80 10 0 28 Jul 2020
AutoClip: Adaptive Gradient Clipping for Source Separation Networks Prem Seetharaman Gordon Wichern Bryan Pardo Jonathan Le Roux 67 34 0 25 Jul 2020
Neural networks with late-phase weights J. Oswald Seijin Kobayashi Alexander Meulemans Christian Henning Benjamin Grewe João Sacramento 94 35 0 25 Jul 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism Yosuke Oyama N. Maruyama Nikoli Dryden Erin McCarthy P. Harrington J. Balewski Satoshi Matsuoka Peter Nugent B. Van Essen 3DV AI4CE 71 37 0 25 Jul 2020
Linear discriminant initialization for feed-forward neural networks Marissa Masden D. Sinha FedML 40 3 0 24 Jul 2020
Deforming the Loss Surface Liangming Chen Long Jin Xiujuan Du Shuai Li Mei Liu ODL 21 0 0 24 Jul 2020
Randomized Automatic Differentiation Deniz Oktay N. McGreivy Joshua Aduol Alex Beatson Ryan P. Adams ODL 65 27 0 20 Jul 2020
On regularization of gradient descent, layer imbalance and flat minima Boris Ginsburg 6 2 0 18 Jul 2020
Understanding Implicit Regularization in Over-Parameterized Single Index Model Jianqing Fan Zhuoran Yang Mengxin Yu 81 18 0 16 Jul 2020
Data-driven effective model shows a liquid-like deep learning Wenxuan Zou Haiping Huang 58 2 0 16 Jul 2020
Explicit Regularisation in Gaussian Noise Injections A. Camuto M. Willetts Umut Simsekli Stephen J. Roberts Chris Holmes 100 59 0 14 Jul 2020
Beyond Graph Neural Networks with Lifted Relational Neural Networks Gustav Sourek F. Železný Ondrej Kuzelka NAI 131 18 0 13 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning Peng Jiang G. Agrawal 54 5 0 13 Jul 2020
A Study of Gradient Variance in Deep Learning Fartash Faghri David Duvenaud David J. Fleet Jimmy Ba FedML ODL 59 27 0 09 Jul 2020
Distributed Training of Deep Learning Models: A Taxonomic Perspective M. Langer Zhen He W. Rahayu Yanbo Xue 70 78 0 08 Jul 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training Weiyan Wang Cengguang Zhang Liu Yang Kai Chen Kun Tan 68 12 0 07 Jul 2020
Predicting Porosity, Permeability, and Tortuosity of Porous Media from Images by Deep Learning K. Graczyk M. Matyka 3DV AI4CE 95 117 0 06 Jul 2020
Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion Yi Chen Jinglin Chen Jing-rong Dong Jian-wei Peng Zhaoran Wang 82 33 0 04 Jul 2020
Variance reduction for Riemannian non-convex optimization with batch size adaptation Andi Han Junbin Gao 85 5 0 03 Jul 2020
The Global Landscape of Neural Networks: An Overview Ruoyu Sun Dawei Li Shiyu Liang Tian Ding R. Srikant 84 88 0 02 Jul 2020
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning Lionel Blondé Pablo Strasser Alexandros Kalousis 90 22 0 28 Jun 2020
Is SGD a Bayesian sampler? Well, almost Chris Mingard Guillermo Valle Pérez Joar Skalse A. Louis BDL 77 53 0 26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent Samuel L. Smith Erich Elsen Soham De MLT 62 100 0 26 Jun 2020
Effective Elastic Scaling of Deep Learning Workloads Vaibhav Saxena K.R. Jayaram Saurav Basu Yogish Sabharwal Ashish Verma 49 9 0 24 Jun 2020
Dynamic of Stochastic Gradient Descent with State-Dependent Noise Qi Meng Shiqi Gong Wei Chen Zhi-Ming Ma Tie-Yan Liu 53 16 0 24 Jun 2020
Understanding Deep Architectures with Reasoning Layer Xinshi Chen Yufei Zhang C. Reisinger Le Song AI4CE 127 7 0 24 Jun 2020
Exploiting Contextual Information with Deep Neural Networks Ismail Elezi 50 3 0 21 Jun 2020