The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

15 December 2017

Papers citing "The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions"

4 / 4 papers shown

Title
Self-Supervised Learning of Linear Precoders under Non-Linear PA Distortion for Energy-Efficient Massive MIMO Systems Thomas Feys Xavier Mestre François Rottenberg 11 2 0 13 Oct 2022
A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks Zhaodong Chen Lei Deng Bangyan Wang Guoqi Li Yuan Xie 27 28 0 01 Jan 2020
Training Deeper Neural Machine Translation Models with Transparent Attention Ankur Bapna M. Chen Orhan Firat Yuan Cao Yonghui Wu 29 138 0 22 Aug 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 275 2,888 0 15 Sep 2016