ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.04231
  4. Cited By
Identity Matters in Deep Learning

Identity Matters in Deep Learning

14 November 2016
Moritz Hardt
Tengyu Ma
    OOD
ArXivPDFHTML

Papers citing "Identity Matters in Deep Learning"

50 / 70 papers shown
Title
Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles
Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles
Amir Ali Farzin
Yuen-Man Pun
Iman Shames
31
0
0
04 May 2025
Stacking as Accelerated Gradient Descent
Stacking as Accelerated Gradient Descent
Naman Agarwal
Pranjal Awasthi
Satyen Kale
Eric Zhao
ODL
65
2
0
20 Feb 2025
Understanding the training of infinitely deep and wide ResNets with
  Conditional Optimal Transport
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Raphael Barboni
Gabriel Peyré
Franccois-Xavier Vialard
30
3
0
19 Mar 2024
Neural Parameter Regression for Explicit Representations of PDE Solution
  Operators
Neural Parameter Regression for Explicit Representations of PDE Solution Operators
Konrad Mundinger
Max Zimmer
S. Pokutta
42
0
0
19 Mar 2024
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike
  \emph{sign} perceptrons neural networks
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks
M. Stojnic
22
1
0
13 Dec 2023
Capacity of the treelike sign perceptrons neural networks with one
  hidden layer -- RDT based upper bounds
Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds
M. Stojnic
16
4
0
13 Dec 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
Memorization Capacity of Neural Networks with Conditional Computation
Memorization Capacity of Neural Networks with Conditional Computation
Erdem Koyuncu
30
4
0
20 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear
  Networks Trained with Bures-Wasserstein Loss
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
23
3
0
06 Mar 2023
Maximally Compact and Separated Features with Regular Polytope Networks
Maximally Compact and Separated Features with Regular Polytope Networks
F. Pernici
Matteo Bruni
C. Baecchi
A. Bimbo
10
19
0
15 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent
  Variable Models
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
45
11
0
30 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
38
10
0
01 Dec 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR
  Prediction
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Wenlong Deng
Lang Lang
Z. Liu
B. Liu
21
0
0
09 Oct 2022
Stability and Generalization for Markov Chain Stochastic Gradient
  Methods
Stability and Generalization for Markov Chain Stochastic Gradient Methods
Puyu Wang
Yunwen Lei
Yiming Ying
Ding-Xuan Zhou
16
18
0
16 Sep 2022
Transforming PageRank into an Infinite-Depth Graph Neural Network
Transforming PageRank into an Infinite-Depth Graph Neural Network
Andreas Roth
Thomas Liebig
GNN
34
13
0
01 Jul 2022
From Perception to Programs: Regularize, Overparameterize, and Amortize
From Perception to Programs: Regularize, Overparameterize, and Amortize
Hao Tang
Kevin Ellis
NAI
22
10
0
13 Jun 2022
Randomly Initialized One-Layer Neural Networks Make Data Linearly
  Separable
Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable
Promit Ghosal
Srinath Mahankali
Yihang Sun
MLT
17
4
0
24 May 2022
Statistical Guarantees for Approximate Stationary Points of Simple
  Neural Networks
Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks
Mahsa Taheri
Fang Xie
Johannes Lederer
21
0
0
09 May 2022
Sharper Utility Bounds for Differentially Private Models
Sharper Utility Bounds for Differentially Private Models
Yilin Kang
Yong Liu
Jian Li
Weiping Wang
FedML
19
3
0
22 Apr 2022
Convergence of gradient descent for deep neural networks
Convergence of gradient descent for deep neural networks
S. Chatterjee
ODL
16
20
0
30 Mar 2022
Architecture Matters in Continual Learning
Architecture Matters in Continual Learning
Seyed Iman Mirzadeh
Arslan Chaudhry
Dong Yin
Timothy Nguyen
Razvan Pascanu
Dilan Görür
Mehrdad Farajtabar
OOD
KELM
114
58
0
01 Feb 2022
Designing Universal Causal Deep Learning Models: The Geometric
  (Hyper)Transformer
Designing Universal Causal Deep Learning Models: The Geometric (Hyper)Transformer
Beatrice Acciaio
Anastasis Kratsios
G. Pammer
OOD
39
20
0
31 Jan 2022
Stochastic Neural Networks with Infinite Width are Deterministic
Stochastic Neural Networks with Infinite Width are Deterministic
Liu Ziyin
Hanlin Zhang
Xiangming Meng
Yuting Lu
Eric P. Xing
Masakuni Ueda
21
3
0
30 Jan 2022
Improved Overparametrization Bounds for Global Convergence of Stochastic
  Gradient Descent for Shallow Neural Networks
Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks
Bartlomiej Polaczyk
J. Cyranka
ODL
28
3
0
28 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
  Optimization Landscape
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
FDGATII : Fast Dynamic Graph Attention with Initial Residual and
  Identity Mapping
FDGATII : Fast Dynamic Graph Attention with Initial Residual and Identity Mapping
Gayan K. Kulatilleke
Marius Portmann
Ryan K. L. Ko
Shekhar S. Chandra
15
9
0
21 Oct 2021
The loss landscape of deep linear neural networks: a second-order
  analysis
The loss landscape of deep linear neural networks: a second-order analysis
E. M. Achour
Franccois Malgouyres
Sébastien Gerchinovitz
ODL
22
9
0
28 Jul 2021
Improved Learning Rates for Stochastic Optimization: Two Theoretical
  Viewpoints
Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints
Shaojie Li
Yong Liu
10
13
0
19 Jul 2021
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
  Trained by Gradient Descent
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
Spencer Frei
Quanquan Gu
15
25
0
25 Jun 2021
Exploring Counterfactual Explanations Through the Lens of Adversarial
  Examples: A Theoretical and Empirical Analysis
Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis
Martin Pawelczyk
Chirag Agarwal
Shalmali Joshi
Sohini Upadhyay
Himabindu Lakkaraju
AAML
11
51
0
18 Jun 2021
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and
  its Applications to Regularization
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
17
41
0
07 Dec 2020
Learning Graph Neural Networks with Approximate Gradient Descent
Learning Graph Neural Networks with Approximate Gradient Descent
Qunwei Li
Shaofeng Zou
Leon Wenliang Zhong
GNN
25
1
0
07 Dec 2020
Expressivity of Deep Neural Networks
Expressivity of Deep Neural Networks
Ingo Gühring
Mones Raslan
Gitta Kutyniok
16
50
0
09 Jul 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
16
45
0
22 Jun 2020
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Noam Razin
Nadav Cohen
16
155
0
13 May 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
  Optimization Via Overparameterization From Depth
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
Yiping Lu
Chao Ma
Yulong Lu
Jianfeng Lu
Lexing Ying
MLT
31
78
0
11 Mar 2020
Memory capacity of neural networks with threshold and ReLU activations
Memory capacity of neural networks with threshold and ReLU activations
Roman Vershynin
21
21
0
20 Jan 2020
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu
Qingcan Wang
Chao Ma
ODL
AI4CE
20
22
0
02 Nov 2019
Beyond Linearization: On Quadratic and Higher-Order Approximation of
  Wide Neural Networks
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
Yu Bai
J. Lee
11
116
0
03 Oct 2019
Residual Networks Behave Like Boosting Algorithms
Residual Networks Behave Like Boosting Algorithms
Chapman Siu
9
9
0
25 Sep 2019
Optimal Function Approximation with Relu Neural Networks
Optimal Function Approximation with Relu Neural Networks
Bo Liu
Yi Liang
25
33
0
09 Sep 2019
Chaining Meets Chain Rule: Multilevel Entropic Regularization and
  Training of Neural Nets
Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets
Amir-Reza Asadi
Emmanuel Abbe
BDL
AI4CE
21
13
0
26 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
24
491
0
31 May 2019
Fine-Grained Analysis of Optimization and Generalization for
  Overparameterized Two-Layer Neural Networks
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
23
961
0
24 Jan 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks
Width Provably Matters in Optimization for Deep Linear Neural Networks
S. Du
Wei Hu
11
93
0
24 Jan 2019
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
11
446
0
21 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
J. Lee
Haochuan Li
Liwei Wang
M. Tomizuka
ODL
15
1,120
0
09 Nov 2018
A Closer Look at Deep Policy Gradients
A Closer Look at Deep Policy Gradients
Andrew Ilyas
Logan Engstrom
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
Larry Rudolph
Aleksander Madry
20
50
0
06 Nov 2018
Small ReLU networks are powerful memorizers: a tight analysis of
  memorization capacity
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
Chulhee Yun
S. Sra
Ali Jadbabaie
13
117
0
17 Oct 2018
A Convergence Analysis of Gradient Descent for Deep Linear Neural
  Networks
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
6
280
0
04 Oct 2018
12
Next