Identity Matters in Deep Learning

14 November 2016

Papers citing "Identity Matters in Deep Learning"

20 / 70 papers shown

Title
Gradient Descent Provably Optimizes Over-parameterized Neural Networks S. Du Xiyu Zhai Barnabás Póczós Aarti Singh MLT ODL 33 1,251 0 04 Oct 2018
Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks Ohad Shamir 22 45 0 23 Sep 2018
Training Deeper Neural Machine Translation Models with Transparent Attention Ankur Bapna M. Chen Orhan Firat Yuan Cao Yonghui Wu 29 138 0 22 Aug 2018
ResNet with one-neuron hidden layers is a Universal Approximator Hongzhou Lin Stefanie Jegelka 28 227 0 28 Jun 2018
Learning One-hidden-layer ReLU Networks via Gradient Descent Xiao Zhang Yaodong Yu Lingxiao Wang Quanquan Gu MLT 26 134 0 20 Jun 2018
Understanding Batch Normalization Johan Bjorck Carla P. Gomes B. Selman Kilian Q. Weinberger 11 592 0 01 Jun 2018
How Does Batch Normalization Help Optimization? Shibani Santurkar Dimitris Tsipras Andrew Ilyas A. Madry ODL 16 1,521 0 29 May 2018
Adding One Neuron Can Eliminate All Bad Local Minima Shiyu Liang Ruoyu Sun J. Lee R. Srikant 29 89 0 22 May 2018
How Many Samples are Needed to Estimate a Convolutional or Recurrent Neural Network? S. Du Yining Wang Xiyu Zhai Sivaraman Balakrishnan Ruslan Salakhutdinov Aarti Singh SSL 13 57 0 21 May 2018
Improved Learning of One-hidden-layer Convolutional Neural Networks with Overlaps S. Du Surbhi Goel MLT 20 17 0 20 May 2018
Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks Peter L. Bartlett D. Helmbold Philip M. Long 23 116 0 16 Feb 2018
Deep Neural Nets with Interpolating Function as Output Activation Bao Wang Xiyang Luo Z. Li Wei-wei Zhu Zuoqiang Shi Stanley J. Osher 20 3 0 01 Feb 2018
Fix your classifier: the marginal value of training the last weight layer Elad Hoffer Itay Hubara Daniel Soudry 24 101 0 14 Jan 2018
Visualizing the Loss Landscape of Neural Nets Hao Li Zheng Xu Gavin Taylor Christoph Studer Tom Goldstein 63 1,842 0 28 Dec 2017
Global optimality conditions for deep neural networks Chulhee Yun S. Sra Ali Jadbabaie 121 117 0 08 Jul 2017
Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization Qunwei Li Yi Zhou Yingbin Liang P. Varshney 18 94 0 14 May 2017
Skip Connections Eliminate Singularities Emin Orhan Xaq Pitkow 28 25 0 31 Jan 2017
Removal of Batch Effects using Distribution-Matching Residual Networks Uri Shaham Kelly P. Stanton Jun Zhao Huamin Li K. Raddassi Ruth R. Montgomery Y. Kluger 16 159 0 13 Oct 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark W. Schmidt 130 1,198 0 16 Aug 2016
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 179 1,185 0 30 Nov 2014