v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
Smoothness Analysis of Adversarial Training Sekitoshi Kanai Masanori Yamada Hiroshi Takahashi Yuki Yamanaka Yasutoshi Ida AAML 95 6 0 02 Mar 2021
Acceleration via Fractal Learning Rate Schedules Naman Agarwal Surbhi Goel Cyril Zhang 76 18 0 01 Mar 2021
Siamese Labels Auxiliary Learning Wenrui Gan Zhulin Liu Chong Chen Tong Zhang 35 2 0 27 Feb 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability Jeremy M. Cohen Simran Kaur Yuanzhi Li J. Zico Kolter Ameet Talwalkar ODL 131 279 0 26 Feb 2021
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling Gregory W. Benton Wesley J. Maddox Sanae Lotfi A. Wilson UQCV 126 70 0 25 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 104 80 0 24 Feb 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization Tianyi Liu Yan Li S. Wei Enlu Zhou T. Zhao 65 13 0 24 Feb 2021
Inductive Bias of Multi-Channel Linear Convolutional Networks with Bounded Weight Norm Meena Jagadeesan Ilya P. Razenshteyn Suriya Gunasekar 113 21 0 24 Feb 2021
The Promises and Pitfalls of Deep Kernel Learning Sebastian W. Ober C. Rasmussen Mark van der Wilk UQCV BDL 82 109 0 24 Feb 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks Jungmin Kwon Jeongseop Kim Hyunseong Park I. Choi 124 291 0 23 Feb 2021
The Uncanny Similarity of Recurrence and Depth Avi Schwarzschild Arjun Gupta Amin Ghiasi Micah Goldblum Tom Goldstein 83 10 0 22 Feb 2021
Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation Shaoxiong Feng Xuancheng Ren Kan Li Xu Sun 62 11 0 22 Feb 2021
Non-Convex Optimization with Spectral Radius Regularization Adam Sandler Diego Klabjan Yuan Luo ODL 45 1 0 22 Feb 2021
Formal Language Theory Meets Modern NLP William Merrill AI4CE NAI 112 13 0 19 Feb 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence Karl Bäckström Ivan Walulya Marina Papatriantafilou P. Tsigas 67 5 0 17 Feb 2021
SWAD: Domain Generalization by Seeking Flat Minima Junbum Cha Sanghyuk Chun Kyungjae Lee Han-Cheol Cho Seunghyun Park Yunsung Lee Sungrae Park MoMe 311 460 0 17 Feb 2021
Generating Structured Adversarial Attacks Using Frank-Wolfe Method Ehsan Kazemi Thomas Kerdreux Liquang Wang AAML DiffM 48 1 0 15 Feb 2021
Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks Frank Schneider Felix Dangel Philipp Hennig 74 10 0 12 Feb 2021
Noisy Recurrent Neural Networks Soon Hoe Lim N. Benjamin Erichson Liam Hodgkinson Michael W. Mahoney 93 54 0 09 Feb 2021
Consensus Control for Decentralized Deep Learning Lingjing Kong Tao R. Lin Anastasia Koloskova Martin Jaggi Sebastian U. Stich 53 79 0 09 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality Courtney Paquette Kiwon Lee Fabian Pedregosa Elliot Paquette 59 35 0 08 Feb 2021
Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise Xingyu Wang Sewoong Oh C. Rhee 75 17 0 08 Feb 2021
Adversarial Training Makes Weight Loss Landscape Sharper in Logistic Regression Masanori Yamada Sekitoshi Kanai Tomoharu Iwata Tomokatsu Takahashi Yuki Yamanaka Hiroshi Takahashi Atsutoshi Kumagai AAML 124 9 0 05 Feb 2021
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models Shang Wang Peiming Yang Yuxuan Zheng Xuelong Li Gennady Pekhimenko 82 22 0 03 Feb 2021
Information-Theoretic Generalization Bounds for Stochastic Gradient Descent Gergely Neu Gintare Karolina Dziugaite Mahdi Haghifam Daniel M. Roy 128 90 0 01 Feb 2021
Exploring the Geometry and Topology of Neural Network Loss Landscapes Stefan Horoi Je-chun Huang Bastian Rieck Guillaume Lajoie Guy Wolf Smita Krishnaswamy 45 13 0 31 Jan 2021
Modelling Sovereign Credit Ratings: Evaluating the Accuracy and Driving Factors using Machine Learning Techniques B. Overes Michel van der Wel 17 6 0 29 Jan 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent Samuel L. Smith Benoit Dherin David Barrett Soham De MLT 62 204 0 28 Jan 2021
cGANs for Cartoon to Real-life Images P. Rajput Kanya Satis Sonnya Dellarosa Wenxuan Huang Obinna Agba GAN 55 2 0 24 Jan 2021
Predicting the Mechanical Properties of Biopolymer Gels Using Neural Networks Trained on Discrete Fiber Network Data Yue Leng Vahidullah Tac S. Calve A. B. Tepole 86 32 0 23 Jan 2021
A shallow neural model for relation prediction Caglar Demir Diego Moussallem A. N. Ngomo 45 11 0 22 Jan 2021
Robustness to Augmentations as a Generalization metric Sumukh K Aithal D. Kashyap Natarajan Subramanyam OOD 36 18 0 16 Jan 2021
BN-invariant sharpness regularizes the training model to better generalization Mingyang Yi Huishuai Zhang Wei Chen Zhi-Ming Ma Tie-Yan Liu 128 3 0 08 Jan 2021
Accelerating Training of Batch Normalization: A Manifold Perspective Mingyang Yi 24 3 0 08 Jan 2021
A spin-glass model for the loss surfaces of generative adversarial networks Nicholas P. Baskerville J. Keating F. Mezzadri J. Najnudel GAN 88 12 0 07 Jan 2021
Topological obstructions in neural networks learning S. Barannikov Daria Voronkova I. Trofimov Alexander Korotin Grigorii Sotnikov Evgeny Burnaev 39 6 0 31 Dec 2020
Optimizing Deeper Transformers on Small Datasets Peng Xu Dhruv Kumar Wei Yang Wenjie Zi Keyi Tang Chenyang Huang Jackie C.K. Cheung S. Prince Yanshuai Cao AI4CE 109 69 0 30 Dec 2020
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability Sangho Yeo Minho Bae Minjoong Jeong Oh-Kyoung Kwon Sangyoon Oh 57 3 0 30 Dec 2020
Mathematical Models of Overparameterized Neural Networks Cong Fang Hanze Dong Tong Zhang 181 23 0 27 Dec 2020
Understanding Decoupled and Early Weight Decay Johan Bjorck Kilian Q. Weinberger Carla P. Gomes 61 25 0 27 Dec 2020
Recent advances in deep learning theory Fengxiang He Dacheng Tao AI4CE 130 51 0 20 Dec 2020
Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues Ricard Durall Avraam Chatzimichailidis P. Labus J. Keuper GAN 77 62 0 17 Dec 2020
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient Fengli Gao Huicai Zhong ODL 35 10 0 16 Dec 2020
DeepLesionBrain: Towards a broader deep-learning generalization for multiple sclerosis lesion segmentation R. A. Kamraoui Vinh-Thong Ta T. Tourdias Boris Mansencal J. V. Manjón Pierrick Coupé OOD 120 54 0 14 Dec 2020
Warm Starting CMA-ES for Hyperparameter Optimization Masahiro Nomura Shuhei Watanabe Youhei Akimoto Yoshihiko Ozaki Masaki Onishi 93 43 0 13 Dec 2020
Enhance Convolutional Neural Networks with Noise Incentive Block Menghan Xia Yi Wang Chu Han T. Wong 40 1 0 09 Dec 2020
Generalization bounds for deep learning Guillermo Valle Pérez A. Louis BDL 82 45 0 07 Dec 2020
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization Adepu Ravi Sankar Yash Khasbage Rahul Vigneswaran V. Balasubramanian 89 44 0 07 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent Kangqiao Liu Liu Ziyin Masakuni Ueda MLT 149 39 0 07 Dec 2020
Why Unsupervised Deep Networks Generalize Anita de Mello Koch E. Koch R. Koch OOD 44 8 0 07 Dec 2020