ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.04231
  4. Cited By
Identity Matters in Deep Learning

Identity Matters in Deep Learning

14 November 2016
Moritz Hardt
Tengyu Ma
    OOD
ArXivPDFHTML

Papers citing "Identity Matters in Deep Learning"

20 / 70 papers shown
Title
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
33
1,251
0
04 Oct 2018
Exponential Convergence Time of Gradient Descent for One-Dimensional
  Deep Linear Neural Networks
Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
Ohad Shamir
22
45
0
23 Sep 2018
Training Deeper Neural Machine Translation Models with Transparent
  Attention
Training Deeper Neural Machine Translation Models with Transparent Attention
Ankur Bapna
M. Chen
Orhan Firat
Yuan Cao
Yonghui Wu
29
138
0
22 Aug 2018
ResNet with one-neuron hidden layers is a Universal Approximator
ResNet with one-neuron hidden layers is a Universal Approximator
Hongzhou Lin
Stefanie Jegelka
28
227
0
28 Jun 2018
Learning One-hidden-layer ReLU Networks via Gradient Descent
Learning One-hidden-layer ReLU Networks via Gradient Descent
Xiao Zhang
Yaodong Yu
Lingxiao Wang
Quanquan Gu
MLT
26
134
0
20 Jun 2018
Understanding Batch Normalization
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
11
592
0
01 Jun 2018
How Does Batch Normalization Help Optimization?
How Does Batch Normalization Help Optimization?
Shibani Santurkar
Dimitris Tsipras
Andrew Ilyas
A. Madry
ODL
16
1,521
0
29 May 2018
Adding One Neuron Can Eliminate All Bad Local Minima
Adding One Neuron Can Eliminate All Bad Local Minima
Shiyu Liang
Ruoyu Sun
J. Lee
R. Srikant
29
89
0
22 May 2018
How Many Samples are Needed to Estimate a Convolutional or Recurrent
  Neural Network?
How Many Samples are Needed to Estimate a Convolutional or Recurrent Neural Network?
S. Du
Yining Wang
Xiyu Zhai
Sivaraman Balakrishnan
Ruslan Salakhutdinov
Aarti Singh
SSL
13
57
0
21 May 2018
Improved Learning of One-hidden-layer Convolutional Neural Networks with
  Overlaps
Improved Learning of One-hidden-layer Convolutional Neural Networks with Overlaps
S. Du
Surbhi Goel
MLT
20
17
0
20 May 2018
Gradient descent with identity initialization efficiently learns
  positive definite linear transformations by deep residual networks
Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks
Peter L. Bartlett
D. Helmbold
Philip M. Long
23
116
0
16 Feb 2018
Deep Neural Nets with Interpolating Function as Output Activation
Deep Neural Nets with Interpolating Function as Output Activation
Bao Wang
Xiyang Luo
Z. Li
Wei-wei Zhu
Zuoqiang Shi
Stanley J. Osher
20
3
0
01 Feb 2018
Fix your classifier: the marginal value of training the last weight
  layer
Fix your classifier: the marginal value of training the last weight layer
Elad Hoffer
Itay Hubara
Daniel Soudry
24
101
0
14 Jan 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
63
1,842
0
28 Dec 2017
Global optimality conditions for deep neural networks
Global optimality conditions for deep neural networks
Chulhee Yun
S. Sra
Ali Jadbabaie
121
117
0
08 Jul 2017
Convergence Analysis of Proximal Gradient with Momentum for Nonconvex
  Optimization
Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization
Qunwei Li
Yi Zhou
Yingbin Liang
P. Varshney
18
94
0
14 May 2017
Skip Connections Eliminate Singularities
Skip Connections Eliminate Singularities
Emin Orhan
Xaq Pitkow
28
25
0
31 Jan 2017
Removal of Batch Effects using Distribution-Matching Residual Networks
Removal of Batch Effects using Distribution-Matching Residual Networks
Uri Shaham
Kelly P. Stanton
Jun Zhao
Huamin Li
K. Raddassi
Ruth R. Montgomery
Y. Kluger
16
159
0
13 Oct 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
130
1,198
0
16 Aug 2016
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
179
1,185
0
30 Nov 2014
Previous
12