ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1406.2572
  4. Cited By
Identifying and attacking the saddle point problem in high-dimensional
  non-convex optimization

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Neural Information Processing Systems (NeurIPS), 2014
10 June 2014
Yann N. Dauphin
Razvan Pascanu
Çağlar Gülçehre
Dong Wang
Surya Ganguli
Yoshua Bengio
    ODL
ArXiv (abs)PDFHTML

Papers citing "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization"

50 / 631 papers shown
Title
Provable and Practical: Efficient Exploration in Reinforcement Learning
  via Langevin Monte Carlo
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloInternational Conference on Learning Representations (ICLR), 2023
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. R. Mahmood
Doina Precup
Anima Anandkumar
Kamyar Azizzadenesheli
BDLOffRL
284
27
0
29 May 2023
Understanding Predictive Coding as an Adaptive Trust-Region Method
Understanding Predictive Coding as an Adaptive Trust-Region Method
Francesco Innocenti
Ryan Singh
Christopher L. Buckley
294
1
0
29 May 2023
Theoretical and Practical Perspectives on what Influence Functions Do
Theoretical and Practical Perspectives on what Influence Functions DoNeural Information Processing Systems (NeurIPS), 2023
Andrea Schioppa
Katja Filippova
Ivan Titov
Polina Zablotskaia
TDI
154
32
0
26 May 2023
Local SGD Accelerates Convergence by Exploiting Second Order Information
  of the Loss Function
Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function
Linxuan Pan
Shenghui Song
FedML
122
2
0
24 May 2023
The Hessian perspective into the Nature of Convolutional Neural Networks
The Hessian perspective into the Nature of Convolutional Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Sidak Pal Singh
Thomas Hofmann
Bernhard Schölkopf
196
11
0
16 May 2023
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Kazuki Osawa
Satoki Ishikawa
Rio Yokota
Shigang Li
Torsten Hoefler
ODL
142
19
0
08 May 2023
Random Function Descent
Random Function DescentNeural Information Processing Systems (NeurIPS), 2023
Felix Benning
L. Döring
152
1
0
02 May 2023
The R-mAtrIx Net
The R-mAtrIx Net
Shailesh Lal
Suvajit Majumder
E. Sobko
72
6
0
14 Apr 2023
Simulated Annealing in Early Layers Leads to Better Generalization
Simulated Annealing in Early Layers Leads to Better GeneralizationComputer Vision and Pattern Recognition (CVPR), 2023
Amirm. Sarfi
Zahra Karimpour
Muawiz Chaudhary
N. Khalid
Mirco Ravanelli
Sudhir Mudur
Eugene Belilovsky
AI4CECLL
149
10
0
10 Apr 2023
A Novel and Fully Automated Domain Transformation Scheme for Near
  Optimal Surrogate Construction
A Novel and Fully Automated Domain Transformation Scheme for Near Optimal Surrogate Construction
J. Bouwer
D. Wilke
S. Kok
38
0
0
30 Mar 2023
Type-II Saddles and Probabilistic Stability of Stochastic Gradient
  Descent
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
Liu Ziyin
Botao Li
Tomer Galanti
Masakuni Ueda
202
8
0
23 Mar 2023
Revisiting DeepFool: generalization and improvement
Revisiting DeepFool: generalization and improvement
Alireza Abdollahpourrostam
Mahed Abroshan
Seyed-Mohsen Moosavi-Dezfooli
AAML
232
2
0
22 Mar 2023
The Cascaded Forward Algorithm for Neural Network Training
The Cascaded Forward Algorithm for Neural Network Training
Gongpei Zhao
Tao Wang
Yidong Li
Yi Jin
Congyan Lang
Haibin Ling
231
19
0
17 Mar 2023
MELON: NeRF with Unposed Images in SO(3)
MELON: NeRF with Unposed Images in SO(3)International Conference on 3D Vision (3DV), 2023
Axel Levy
Mark J. Matthews
Matan Sela
Gordon Wetzstein
Dmitry Lagun
139
3
0
14 Mar 2023
Complex Clipping for Improved Generalization in Machine Learning
Complex Clipping for Improved Generalization in Machine Learning
L. Atlas
Nicholas Rasmussen
Felix Schwock
Mert Pilanci
69
0
0
27 Feb 2023
On a continuous time model of gradient descent dynamics and instability
  in deep learning
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
361
12
0
03 Feb 2023
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Athul Shibu
Abhishek Kumar
Heechul Jung
Dong-Gyu Lee
178
2
0
26 Jan 2023
Exploring Complex Dynamical Systems via Nonconvex Optimization
Exploring Complex Dynamical Systems via Nonconvex Optimization
Hunter L. Elliott
87
0
0
03 Jan 2023
Escaping Saddle Points for Effective Generalization on Class-Imbalanced
  Data
Escaping Saddle Points for Effective Generalization on Class-Imbalanced DataNeural Information Processing Systems (NeurIPS), 2022
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
182
36
0
28 Dec 2022
Langevin algorithms for very deep Neural Networks with application to
  image classification
Langevin algorithms for very deep Neural Networks with application to image classification
Pierre Bras
147
6
0
27 Dec 2022
Langevin algorithms for Markovian Neural Networks and Deep Stochastic
  control
Langevin algorithms for Markovian Neural Networks and Deep Stochastic controlIEEE International Joint Conference on Neural Network (IJCNN), 2022
Pierre Bras
Gilles Pagès
156
6
0
22 Dec 2022
Scalable Bayesian Uncertainty Quantification for Neural Network
  Potentials: Promise and Pitfalls
Scalable Bayesian Uncertainty Quantification for Neural Network Potentials: Promise and PitfallsJournal of Chemical Theory and Computation (JCTC), 2022
Stephan Thaler
Gregor Doehner
Julija Zavadlav
233
24
0
15 Dec 2022
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast
  Evasion of Non-Degenerate Saddle Points
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle PointsIEEE Transactions on Automatic Control (TAC), 2022
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
205
3
0
07 Dec 2022
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic GradientsNeural Information Processing Systems (NeurIPS), 2022
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
244
9
0
05 Dec 2022
A survey of deep learning optimizers -- first and second order methods
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
213
12
0
28 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain
  Generalization
PAC-Bayes Compression Bounds So Tight That They Can Explain GeneralizationNeural Information Processing Systems (NeurIPS), 2022
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDLMLTAI4CE
182
73
0
24 Nov 2022
Understanding Sparse Feature Updates in Deep Networks using Iterative
  Linearisation
Understanding Sparse Feature Updates in Deep Networks using Iterative Linearisation
Adrian Goldwaser
Hong Ge
MLT
199
0
0
22 Nov 2022
Escaping From Saddle Points Using Asynchronous Coordinate Gradient
  Descent
Escaping From Saddle Points Using Asynchronous Coordinate Gradient Descent
Marco Bornstein
Jin-Peng Liu
Jingling Li
Furong Huang
168
1
0
17 Nov 2022
Selecting and Composing Learning Rate Policies for Deep Neural Networks
Selecting and Composing Learning Rate Policies for Deep Neural NetworksACM Transactions on Intelligent Systems and Technology (ACM TIST), 2022
Yanzhao Wu
Ling Liu
126
32
0
24 Oct 2022
On amortizing convex conjugates for optimal transport
On amortizing convex conjugates for optimal transportInternational Conference on Learning Representations (ICLR), 2022
Brandon Amos
OT
296
32
0
21 Oct 2022
NCVX: A General-Purpose Optimization Solver for Constrained Machine and
  Deep Learning
NCVX: A General-Purpose Optimization Solver for Constrained Machine and Deep Learning
Buyun Liang
Tim Mitchell
Ju Sun
OOD
344
11
0
03 Oct 2022
Model Zoos: A Dataset of Diverse Populations of Neural Network Models
Model Zoos: A Dataset of Diverse Populations of Neural Network ModelsNeural Information Processing Systems (NeurIPS), 2022
Konstantin Schurholt
Diyar Taskiran
Boris Knyazev
Xavier Giró-i-Nieto
Damian Borth
285
35
0
29 Sep 2022
Visualizing high-dimensional loss landscapes with Hessian directions
Visualizing high-dimensional loss landscapes with Hessian directionsJournal of Statistical Mechanics: Theory and Experiment (JSTAT), 2022
Lucas Böttcher
Gregory R. Wheeler
210
19
0
28 Aug 2022
Ab-initio quantum chemistry with neural-network wavefunctions
Ab-initio quantum chemistry with neural-network wavefunctionsNature Reviews Chemistry (Nat. Rev. Chem.), 2022
J. Hermann
J. Spencer
Kenny Choo
Antonio Mezzacapo
W. Foulkes
David Pfau
Giuseppe Carleo
Frank Noé
AI4CE
173
110
0
26 Aug 2022
Parameter Averaging for Feature Ranking
Parameter Averaging for Feature Ranking
Talip Uçar
Ehsan Hajiramezanali
85
0
0
05 Aug 2022
Gradient descent provably escapes saddle points in the training of
  shallow ReLU networks
Gradient descent provably escapes saddle points in the training of shallow ReLU networksJournal of Optimization Theory and Applications (JOTA), 2022
Patrick Cheridito
Arnulf Jentzen
Florian Rossmannek
218
8
0
03 Aug 2022
On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks
On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks
Seongjin Park
Haedong Jeong
Tair Djanibekov
Giyoung Jeon
Jinseok Seol
Jaesik Choi
AAML
240
1
0
07 Jul 2022
An Empirical Study of Implicit Regularization in Deep Offline RL
An Empirical Study of Implicit Regularization in Deep Offline RL
Çağlar Gülçehre
Srivatsan Srinivasan
Jakub Sygnowski
Georg Ostrovski
Mehrdad Farajtabar
Matt Hoffman
Razvan Pascanu
Arnaud Doucet
OffRL
266
19
0
05 Jul 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online
  Videos
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online VideosNeural Information Processing Systems (NeurIPS), 2022
Bowen Baker
Ilge Akkaya
Peter Zhokhov
Joost Huizinga
Jie Tang
Adrien Ecoffet
Brandon Houghton
Raul Sampedro
Jeff Clune
OffRL
422
359
0
23 Jun 2022
Parameter Convex Neural Networks
Parameter Convex Neural Networks
Jingcheng Zhou
Wei Wei
Xing Li
Bowen Pang
Zhiming Zheng
73
1
0
11 Jun 2022
Gradient flow dynamics of shallow ReLU networks for square loss and
  orthogonal inputs
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputsNeural Information Processing Systems (NeurIPS), 2022
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
252
73
0
02 Jun 2022
Decoupling multivariate functions using a nonparametric filtered tensor
  decomposition
Decoupling multivariate functions using a nonparametric filtered tensor decompositionMechanical systems and signal processing (MSSP), 2022
J. Decuyper
K. Tiels
S. Weiland
M. Runacres
J. Schoukens
64
3
0
23 May 2022
Training neural networks using Metropolis Monte Carlo and an adaptive
  variant
Training neural networks using Metropolis Monte Carlo and an adaptive variant
S. Whitelam
V. Selin
Ian Benlolo
Corneel Casert
Isaac Tamblyn
BDL
197
10
0
16 May 2022
Gradient Descent, Stochastic Optimization, and Other Tales
Gradient Descent, Stochastic Optimization, and Other Tales
Jun Lu
106
15
0
02 May 2022
FuNNscope: Visual microscope for interactively exploring the loss
  landscape of fully connected neural networks
FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks
Aleksandar Doknic
Torsten Moller
152
2
0
09 Apr 2022
Deep learning, stochastic gradient descent and diffusion maps
Deep learning, stochastic gradient descent and diffusion mapsJournal of Computational Mathematics and Data Science (JCMDS), 2022
Carmina Fjellström
Kaj Nyström
DiffM
186
17
0
04 Apr 2022
AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio
AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio
Jun Lu
ODL
129
5
0
02 Apr 2022
Random matrix analysis of deep neural network weight matrices
Random matrix analysis of deep neural network weight matricesPhysical Review E (Phys. Rev. E), 2022
M. Thamm
Max Staats
B. Rosenow
166
20
0
28 Mar 2022
The worst of both worlds: A comparative analysis of errors in learning
  from data in psychology and machine learning
The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learningAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022
Jessica Hullman
Sayash Kapoor
Priyanka Nanayakkara
Andrew Gelman
Arvind Narayanan
435
42
0
12 Mar 2022
On the Omnipresence of Spurious Local Minima in Certain Neural Network
  Training Problems
On the Omnipresence of Spurious Local Minima in Certain Neural Network Training ProblemsConstructive approximation (Constr. Approx.), 2022
C. Christof
Julia Kowalczyk
251
9
0
23 Feb 2022
Previous
123456...111213
Next