Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.04231
Cited By
Identity Matters in Deep Learning
14 November 2016
Moritz Hardt
Tengyu Ma
OOD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Identity Matters in Deep Learning"
50 / 70 papers shown
Title
Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles
Amir Ali Farzin
Yuen-Man Pun
Iman Shames
31
0
0
04 May 2025
Stacking as Accelerated Gradient Descent
Naman Agarwal
Pranjal Awasthi
Satyen Kale
Eric Zhao
ODL
65
2
0
20 Feb 2025
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Raphael Barboni
Gabriel Peyré
Franccois-Xavier Vialard
32
3
0
19 Mar 2024
Neural Parameter Regression for Explicit Representations of PDE Solution Operators
Konrad Mundinger
Max Zimmer
S. Pokutta
42
0
0
19 Mar 2024
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks
M. Stojnic
22
1
0
13 Dec 2023
Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds
M. Stojnic
16
4
0
13 Dec 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
Memorization Capacity of Neural Networks with Conditional Computation
Erdem Koyuncu
30
4
0
20 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
23
3
0
06 Mar 2023
Maximally Compact and Separated Features with Regular Polytope Networks
F. Pernici
Matteo Bruni
C. Baecchi
A. Bimbo
12
19
0
15 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
45
11
0
30 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
40
10
0
01 Dec 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Wenlong Deng
Lang Lang
Z. Liu
B. Liu
21
0
0
09 Oct 2022
Stability and Generalization for Markov Chain Stochastic Gradient Methods
Puyu Wang
Yunwen Lei
Yiming Ying
Ding-Xuan Zhou
16
18
0
16 Sep 2022
Transforming PageRank into an Infinite-Depth Graph Neural Network
Andreas Roth
Thomas Liebig
GNN
34
13
0
01 Jul 2022
From Perception to Programs: Regularize, Overparameterize, and Amortize
Hao Tang
Kevin Ellis
NAI
22
10
0
13 Jun 2022
Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable
Promit Ghosal
Srinath Mahankali
Yihang Sun
MLT
17
4
0
24 May 2022
Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks
Mahsa Taheri
Fang Xie
Johannes Lederer
21
0
0
09 May 2022
Sharper Utility Bounds for Differentially Private Models
Yilin Kang
Yong Liu
Jian Li
Weiping Wang
FedML
23
3
0
22 Apr 2022
Convergence of gradient descent for deep neural networks
S. Chatterjee
ODL
19
20
0
30 Mar 2022
Architecture Matters in Continual Learning
Seyed Iman Mirzadeh
Arslan Chaudhry
Dong Yin
Timothy Nguyen
Razvan Pascanu
Dilan Görür
Mehrdad Farajtabar
OOD
KELM
114
58
0
01 Feb 2022
Designing Universal Causal Deep Learning Models: The Geometric (Hyper)Transformer
Beatrice Acciaio
Anastasis Kratsios
G. Pammer
OOD
39
20
0
31 Jan 2022
Stochastic Neural Networks with Infinite Width are Deterministic
Liu Ziyin
Hanlin Zhang
Xiangming Meng
Yuting Lu
Eric P. Xing
Masakuni Ueda
21
3
0
30 Jan 2022
Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks
Bartlomiej Polaczyk
J. Cyranka
ODL
30
3
0
28 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
FDGATII : Fast Dynamic Graph Attention with Initial Residual and Identity Mapping
Gayan K. Kulatilleke
Marius Portmann
Ryan K. L. Ko
Shekhar S. Chandra
17
9
0
21 Oct 2021
The loss landscape of deep linear neural networks: a second-order analysis
E. M. Achour
Franccois Malgouyres
Sébastien Gerchinovitz
ODL
22
9
0
28 Jul 2021
Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints
Shaojie Li
Yong Liu
15
13
0
19 Jul 2021
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
Spencer Frei
Quanquan Gu
15
25
0
25 Jun 2021
Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis
Martin Pawelczyk
Chirag Agarwal
Shalmali Joshi
Sohini Upadhyay
Himabindu Lakkaraju
AAML
11
51
0
18 Jun 2021
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
22
41
0
07 Dec 2020
Learning Graph Neural Networks with Approximate Gradient Descent
Qunwei Li
Shaofeng Zou
Leon Wenliang Zhong
GNN
27
1
0
07 Dec 2020
Expressivity of Deep Neural Networks
Ingo Gühring
Mones Raslan
Gitta Kutyniok
16
50
0
09 Jul 2020
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
19
45
0
22 Jun 2020
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Noam Razin
Nadav Cohen
16
155
0
13 May 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
Yiping Lu
Chao Ma
Yulong Lu
Jianfeng Lu
Lexing Ying
MLT
31
78
0
11 Mar 2020
Memory capacity of neural networks with threshold and ReLU activations
Roman Vershynin
23
21
0
20 Jan 2020
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu
Qingcan Wang
Chao Ma
ODL
AI4CE
20
22
0
02 Nov 2019
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
Yu Bai
J. Lee
11
116
0
03 Oct 2019
Residual Networks Behave Like Boosting Algorithms
Chapman Siu
14
9
0
25 Sep 2019
Optimal Function Approximation with Relu Neural Networks
Bo Liu
Yi Liang
25
33
0
09 Sep 2019
Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets
Amir-Reza Asadi
Emmanuel Abbe
BDL
AI4CE
23
13
0
26 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
24
491
0
31 May 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
35
961
0
24 Jan 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks
S. Du
Wei Hu
13
93
0
24 Jan 2019
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
11
446
0
21 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
J. Lee
Haochuan Li
Liwei Wang
M. Tomizuka
ODL
15
1,120
0
09 Nov 2018
A Closer Look at Deep Policy Gradients
Andrew Ilyas
Logan Engstrom
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
Larry Rudolph
Aleksander Madry
22
50
0
06 Nov 2018
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
Chulhee Yun
S. Sra
Ali Jadbabaie
13
117
0
17 Oct 2018
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
8
280
0
04 Oct 2018
1
2
Next