ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.01619
  4. Cited By
Beyond Linearization: On Quadratic and Higher-Order Approximation of
  Wide Neural Networks
v1v2 (latest)

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

International Conference on Learning Representations (ICLR), 2019
3 October 2019
Yu Bai
Jason D. Lee
ArXiv (abs)PDFHTML

Papers citing "Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks"

50 / 80 papers shown
Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime
Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime
Noah Oberweis
Semih Cayci
235
0
0
24 Oct 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
392
11
0
17 Feb 2025
A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks
A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks
Semih Cayci
366
0
0
18 Dec 2024
Learning Expressive Random Feature Models via Parametrized Activations
Learning Expressive Random Feature Models via Parametrized Activations
Zailin Ma
Jiansheng Yang
Yaodong Yang
481
1
0
29 Nov 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient
  Methods
Sharper Guarantees for Learning Neural Network Classifiers with Gradient MethodsInternational Conference on Learning Representations (ICLR), 2024
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
329
2
0
13 Oct 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training
  Dynamics, and Generative Models
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative ModelsAnnual Review of Statistics and Its Application (ARSIA), 2024
Namjoon Suh
Guang Cheng
MedIm
350
18
0
14 Jan 2024
How Graph Neural Networks Learn: Lessons from Training Dynamics
How Graph Neural Networks Learn: Lessons from Training DynamicsInternational Conference on Machine Learning (ICML), 2023
Chenxiao Yang
Qitian Wu
David Wipf
Ruoyu Sun
Junchi Yan
AI4CEGNN
449
2
0
08 Oct 2023
Six Lectures on Linearized Neural Networks
Six Lectures on Linearized Neural NetworksJournal of Statistical Mechanics: Theory and Experiment (J. Stat. Mech.), 2023
Theodor Misiakiewicz
Andrea Montanari
350
18
0
25 Aug 2023
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural
  Networks
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Liam Collins
Hamed Hassani
Mahdi Soltanolkotabi
Aryan Mokhtari
Sanjay Shakkottai
512
13
0
13 Jul 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function SpaceJournal of machine learning research (JMLR), 2023
Zhengdao Chen
352
4
0
03 Jul 2023
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient
  for Convolutional Neural Networks
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Mohammed Nowaz Rabbani Chowdhury
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoE
209
35
0
07 Jun 2023
Tight conditions for when the NTK approximation is valid
Tight conditions for when the NTK approximation is valid
Enric Boix-Adserà
Etai Littwin
307
1
0
22 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural NetworksNeural Information Processing Systems (NeurIPS), 2023
Eshaan Nichani
Alexandru Damian
Jason D. Lee
MLT
494
16
0
11 May 2023
Towards a Phenomenological Understanding of Neural Networks: Data
Towards a Phenomenological Understanding of Neural Networks: Data
S. Tovey
Sven Krippendorf
Konstantin Nikolaou
Daniel Fink
FedML
155
4
0
01 May 2023
Depth Separation with Multilayer Mean-Field Networks
Depth Separation with Multilayer Mean-Field NetworksInternational Conference on Learning Representations (ICLR), 2023
Y. Ren
Mo Zhou
Rong Ge
OOD
223
3
0
03 Apr 2023
TRAK: Attributing Model Behavior at Scale
TRAK: Attributing Model Behavior at ScaleInternational Conference on Machine Learning (ICML), 2023
Sung Min Park
Kristian Georgiev
Andrew Ilyas
Guillaume Leclerc
Aleksander Madry
TDI
393
229
0
24 Mar 2023
Gradient Descent in Neural Networks as Sequential Learning in RKBS
Gradient Descent in Neural Networks as Sequential Learning in RKBS
A. Shilton
Sunil R. Gupta
Santu Rana
Svetha Venkatesh
MLT
286
2
0
01 Feb 2023
Spectral Evolution and Invariance in Linear-width Neural Networks
Spectral Evolution and Invariance in Linear-width Neural NetworksNeural Information Processing Systems (NeurIPS), 2022
Zhichao Wang
A. Engel
Anand D. Sarwate
Ioana Dumitriu
Tony Chiang
278
25
0
11 Nov 2022
Learning Single-Index Models with Shallow Neural Networks
Learning Single-Index Models with Shallow Neural NetworksNeural Information Processing Systems (NeurIPS), 2022
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
328
89
0
27 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language ModelsInternational Conference on Machine Learning (ICML), 2022
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
320
68
0
25 Oct 2022
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
When Expressivity Meets Trainability: Fewer than nnn Neurons Can WorkNeural Information Processing Systems (NeurIPS), 2022
Jiawei Zhang
Yushun Zhang
Mingyi Hong
Tian Ding
Jianfeng Yao
323
10
0
21 Oct 2022
Second-order regression models exhibit progressive sharpening to the
  edge of stability
Second-order regression models exhibit progressive sharpening to the edge of stabilityInternational Conference on Machine Learning (ICML), 2022
Atish Agarwala
Fabian Pedregosa
Jeffrey Pennington
252
32
0
10 Oct 2022
Stability and Generalization Analysis of Gradient Methods for Shallow
  Neural Networks
Stability and Generalization Analysis of Gradient Methods for Shallow Neural NetworksNeural Information Processing Systems (NeurIPS), 2022
Yunwen Lei
Rong Jin
Yiming Ying
MLT
268
23
0
19 Sep 2022
Approximation results for Gradient Descent trained Shallow Neural
  Networks in $1d$
Approximation results for Gradient Descent trained Shallow Neural Networks in 1d1d1d
R. Gentile
G. Welper
ODL
272
9
0
17 Sep 2022
Towards Understanding Mixture of Experts in Deep Learning
Towards Understanding Mixture of Experts in Deep Learning
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
MLTMoE
183
82
0
04 Aug 2022
Feature selection with gradient descent on two-layer networks in
  low-rotation regimes
Feature selection with gradient descent on two-layer networks in low-rotation regimes
Matus Telgarsky
MLT
180
17
0
04 Aug 2022
Neural Networks can Learn Representations with Gradient Descent
Neural Networks can Learn Representations with Gradient DescentAnnual Conference Computational Learning Theory (COLT), 2022
Alexandru Damian
Jason D. Lee
Mahdi Soltanolkotabi
SSLMLT
303
158
0
30 Jun 2022
Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming
Fighting Fire with Fire: Avoiding DNN Shortcuts through PrimingInternational Conference on Machine Learning (ICML), 2022
Chuan Wen
Jianing Qian
Jierui Lin
Jiaye Teng
Dinesh Jayaraman
Yang Gao
AAML
217
21
0
22 Jun 2022
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
  Reward
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise RewardIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2022
Tengyu Xu
Yue Wang
Shaofeng Zou
Yingbin Liang
OffRL
239
15
0
13 Jun 2022
Non-convex online learning via algorithmic equivalence
Non-convex online learning via algorithmic equivalenceNeural Information Processing Systems (NeurIPS), 2022
Udaya Ghai
Zhou Lu
Elad Hazan
213
13
0
30 May 2022
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student
  Settings and its Superiority to Kernel Methods
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel MethodsInternational Conference on Learning Representations (ICLR), 2022
Shunta Akiyama
Taiji Suzuki
211
9
0
30 May 2022
Quadratic models for understanding catapult dynamics of neural networks
Quadratic models for understanding catapult dynamics of neural networksInternational Conference on Learning Representations (ICLR), 2022
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
304
16
0
24 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step
  Improves the Representation
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the RepresentationNeural Information Processing Systems (NeurIPS), 2022
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
251
117
0
03 May 2022
On Feature Learning in Neural Networks with Global Convergence
  Guarantees
On Feature Learning in Neural Networks with Global Convergence GuaranteesInternational Conference on Learning Representations (ICLR), 2022
Zhengdao Chen
Eric Vanden-Eijnden
Joan Bruna
MLT
230
15
0
22 Apr 2022
Random Feature Amplification: Feature Learning and Generalization in
  Neural Networks
Random Feature Amplification: Feature Learning and Generalization in Neural NetworksJournal of machine learning research (JMLR), 2022
Spencer Frei
Niladri S. Chatterji
Peter L. Bartlett
MLT
295
32
0
15 Feb 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Multi-scale Feature Learning Dynamics: Insights for Double DescentInternational Conference on Machine Learning (ICML), 2021
Mohammad Pezeshki
Amartya Mitra
Yoshua Bengio
Guillaume Lajoie
192
31
0
06 Dec 2021
Optimization-Based Separations for Neural Networks
Optimization-Based Separations for Neural NetworksAnnual Conference Computational Learning Theory (COLT), 2021
Itay Safran
Jason D. Lee
726
16
0
04 Dec 2021
Dynamics of Local Elasticity During Training of Neural Nets
Dynamics of Local Elasticity During Training of Neural Nets
Soham Dan
Anirbit Mukherjee
Avirup Das
Phanideep Gampa
268
0
0
01 Nov 2021
On Reward-Free RL with Kernel and Neural Function Approximations:
  Single-Agent MDP and Markov Game
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game
Delin Qu
Jieping Ye
Zhaoran Wang
Zhuoran Yang
OffRL
221
25
0
19 Oct 2021
Provable Regret Bounds for Deep Online Learning and Control
Provable Regret Bounds for Deep Online Learning and Control
Xinyi Chen
Edgar Minasyan
Jason D. Lee
Elad Hazan
353
6
0
15 Oct 2021
On the Provable Generalization of Recurrent Neural Networks
On the Provable Generalization of Recurrent Neural Networks
Lifu Wang
Bo Shen
Bo Hu
Xing Cao
395
9
0
29 Sep 2021
Understanding the Generalization of Adam in Learning Neural Networks
  with Proper Regularization
Understanding the Generalization of Adam in Learning Neural Networks with Proper RegularizationInternational Conference on Learning Representations (ICLR), 2021
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLTAI4CE
326
52
0
25 Aug 2021
Stability & Generalisation of Gradient Descent for Shallow Neural
  Networks without the Neural Tangent Kernel
Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent KernelNeural Information Processing Systems (NeurIPS), 2021
Dominic Richards
Ilja Kuzborskij
177
33
0
27 Jul 2021
Going Beyond Linear RL: Sample Efficient Neural Function Approximation
Going Beyond Linear RL: Sample Efficient Neural Function Approximation
Baihe Huang
Kaixuan Huang
Sham Kakade
Jason D. Lee
Qi Lei
Runzhe Wang
Jiaqi Yang
193
10
0
14 Jul 2021
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
Optimal Gradient-based Algorithms for Non-concave Bandit OptimizationNeural Information Processing Systems (NeurIPS), 2021
Baihe Huang
Kaixuan Huang
Sham Kakade
Jason D. Lee
Qi Lei
Runzhe Wang
Jiaqi Yang
517
19
0
09 Jul 2021
Understanding Deflation Process in Over-parametrized Tensor
  Decomposition
Understanding Deflation Process in Over-parametrized Tensor DecompositionNeural Information Processing Systems (NeurIPS), 2021
Rong Ge
Y. Ren
Xiang Wang
Mo Zhou
167
20
0
11 Jun 2021
The Limitations of Large Width in Neural Networks: A Deep Gaussian
  Process Perspective
The Limitations of Large Width in Neural Networks: A Deep Gaussian Process PerspectiveNeural Information Processing Systems (NeurIPS), 2021
Geoff Pleiss
John P. Cunningham
243
30
0
11 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at InitializationNeural Information Processing Systems (NeurIPS), 2021
Mufan Li
Mihai Nica
Daniel M. Roy
316
36
0
07 Jun 2021
Toward Understanding the Feature Learning Process of Self-supervised
  Contrastive Learning
Toward Understanding the Feature Learning Process of Self-supervised Contrastive LearningInternational Conference on Machine Learning (ICML), 2021
Zixin Wen
Yuanzhi Li
SSLMLT
408
153
0
31 May 2021
Unintended Effects on Adaptive Learning Rate for Training Neural Network
  with Output Scale Change
Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change
Ryuichi Kanoh
M. Sugiyama
124
0
0
05 Mar 2021
12
Next