v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
Hyperplane Arrangements of Trained ConvNets Are Biased Matteo Gamba S. Carlsson Hossein Azizpour Mårten Björkman 41 5 0 17 Mar 2020
Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations Subhajit Chaudhury T. Yamasaki 24 3 0 14 Mar 2020
Interference and Generalization in Temporal Difference Learning Emmanuel Bengio Joelle Pineau Doina Precup 77 61 0 13 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey Zhenheng Tang Shaoshuai Shi Wei Wang Yue Liu Xiaowen Chu 80 49 0 10 Mar 2020
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule Nikhil Iyer V. Thejas Nipun Kwatra Ramachandran Ramjee Muthian Sivathanu 75 29 0 09 Mar 2020
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks Majed El Helou Frederike Dumbgen Sabine Süsstrunk CLL AI4CE 32 2 0 07 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey Shuo Ouyang Dezun Dong Yemao Xu Liquan Xiao 116 12 0 06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 218 241 0 04 Mar 2020
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited Wesley J. Maddox Gregory W. Benton A. Wilson 136 61 0 04 Mar 2020
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond Kaidi Xu Zhouxing Shi Huan Zhang Yihan Wang Kai-Wei Chang Minlie Huang B. Kailkhura Xinyu Lin Cho-Jui Hsieh AAML 55 12 0 28 Feb 2020
The Implicit and Explicit Regularization Effects of Dropout Colin Wei Sham Kakade Tengyu Ma 116 118 0 28 Feb 2020
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization S. Chatterjee ODL OOD 117 51 0 25 Feb 2020
The Two Regimes of Deep Network Training Guillaume Leclerc Aleksander Madry 94 45 0 24 Feb 2020
De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors A. Banerjee Tiancong Chen Yingxue Zhou BDL 70 8 0 23 Feb 2020
Communication-Efficient Edge AI: Algorithms and Systems Yuanming Shi Kai Yang Tao Jiang Jun Zhang Khaled B. Letaief GNN 99 334 0 22 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 88 164 0 21 Feb 2020
Parallel and distributed asynchronous adaptive stochastic gradient methods Yangyang Xu Yibo Xu Yonggui Yan Colin Sutcher-Shepard Leopold Grinberg Jiewei Chen 30 2 0 21 Feb 2020
Bayesian Deep Learning and a Probabilistic Perspective of Generalization A. Wilson Pavel Izmailov UQCV BDL OOD 148 656 0 20 Feb 2020
Do We Need Zero Training Loss After Achieving Zero Training Error? Takashi Ishida Ikko Yamane Tomoya Sakai Gang Niu Masashi Sugiyama AI4CE 70 137 0 20 Feb 2020
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning Karsten Roth Timo Milbich Samarth Sinha Prateek Gupta Bjorn Ommer Joseph Paul Cohen 163 173 0 19 Feb 2020
Unique Properties of Flat Minima in Deep Networks Rotem Mulayoff T. Michaeli ODL 59 4 0 11 Feb 2020
Think Global, Act Local: Relating DNN generalisation and node-level SNR Paul Norridge 24 1 0 11 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima Zeke Xie Issei Sato Masashi Sugiyama ODL 127 17 0 10 Feb 2020
Large Batch Training Does Not Need Warmup Zhouyuan Huo Bin Gu Heng-Chiao Huang AI4CE ODL 47 5 0 04 Feb 2020
Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization Santiago Gonzalez Risto Miikkulainen 55 9 0 31 Jan 2020
The Case for Bayesian Deep Learning A. Wilson UQCV BDL OOD 132 114 0 29 Jan 2020
Identifying Mislabeled Data using the Area Under the Margin Ranking Geoff Pleiss Tianyi Zhang Ethan R. Elenberg Kilian Q. Weinberger NoLa 119 274 0 28 Jan 2020
Automatic phantom test pattern classification through transfer learning with deep neural networks Rafael B. Fricks Justin Solomon Ehsan Samei MedIm 30 0 0 22 Jan 2020
A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis Christopher J. Urban Daniel J. Bauer BDL 64 33 0 22 Jan 2020
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters Jinlong Liu Guo-qing Jiang Yunzhi Bai Ting Chen Huayan Wang AI4CE 143 50 0 21 Jan 2020
SEERL: Sample Efficient Ensemble Reinforcement Learning Rohan Saphal Balaraman Ravindran Dheevatsa Mudigere Sasikanth Avancha Bharat Kaul 65 19 0 15 Jan 2020
Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification Xin Jin Cuiling Lan Wenjun Zeng Zhibo Chen 79 106 0 15 Jan 2020
Understanding Generalization in Deep Learning via Tensor Methods Jingling Li Yanchao Sun Jiahao Su Taiji Suzuki Furong Huang 112 28 0 14 Jan 2020
Rethinking Curriculum Learning with Incremental Labels and Adaptive Compensation Madan Ravi Ganesh Jason J. Corso ODL 54 10 0 13 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well Vipul Gupta S. Serrano D. DeCoste MoMe 83 60 0 07 Jan 2020
Relative Flatness and Generalization Henning Petzka Michael Kamp Linara Adilova C. Sminchisescu Mario Boley 87 78 0 03 Jan 2020
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space Moshir Harsh J. Tubiana Simona Cocco R. Monasson 49 15 0 30 Dec 2019
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity Konpat Preechakul B. Kijsirikul ODL 38 3 0 24 Dec 2019
Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks Aleksandr Shevchenko Marco Mondelli 196 37 0 20 Dec 2019
Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification Bennet Breier A. Onken 36 4 0 20 Dec 2019
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 137 169 0 19 Dec 2019
Tangent Space Separability in Feedforward Neural Networks Balint Daroczy Rita Aleksziev András A. Benczúr 45 3 0 18 Dec 2019
Learning under Model Misspecification: Applications to Variational and Ensemble methods A. Masegosa 16 1 0 18 Dec 2019
On the Bias-Variance Tradeoff: Textbooks Need an Update Brady Neal 43 18 0 17 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis Jonathan Frankle Gintare Karolina Dziugaite Daniel M. Roy Michael Carbin MoMe 181 630 0 11 Dec 2019
Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability Shuaicheng Liu Ze Zhang Kai Song B. Zeng 24 1 0 10 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers T. Nguyen Animesh Garg Richard G. Baraniuk Anima Anandkumar TPM 104 9 0 09 Dec 2019
Observational Overfitting in Reinforcement Learning Xingyou Song Yiding Jiang Stephen Tu Yilun Du Behnam Neyshabur OffRL 124 140 0 06 Dec 2019
Fantastic Generalization Measures and Where to Find Them Yiding Jiang Behnam Neyshabur H. Mobahi Dilip Krishnan Samy Bengio AI4CE 148 611 0 04 Dec 2019
The Group Loss for Deep Metric Learning Ismail Elezi Sebastiano Vascon Alessandro Torcinovich Marcello Pelillo Laura Leal-Taixe 175 51 0 01 Dec 2019