ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1406.2572
  4. Cited By
Identifying and attacking the saddle point problem in high-dimensional
  non-convex optimization

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Neural Information Processing Systems (NeurIPS), 2014
10 June 2014
Yann N. Dauphin
Razvan Pascanu
Çağlar Gülçehre
Dong Wang
Surya Ganguli
Yoshua Bengio
    ODL
ArXiv (abs)PDFHTML

Papers citing "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization"

50 / 631 papers shown
Title
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
Min Gan
Guang-yong Chen
Yang Yi
Lin Yang
60
0
0
03 Nov 2025
Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations
Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations
Alexandru Crăciun
Debarghya Ghoshdastidar
MLT
76
0
0
28 Oct 2025
Nonlinear discretizations and Newton's method: characterizing stationary points of regression objectives
Nonlinear discretizations and Newton's method: characterizing stationary points of regression objectives
Conor Rowan
ODL
188
1
0
13 Oct 2025
Long-tailed Recognition with Model Rebalancing
Long-tailed Recognition with Model Rebalancing
Jiaan Luo
Feng Hong
Qiang Hu
Xiaofeng Cao
Feng Liu
Jiangchao Yao
136
0
0
09 Oct 2025
AutoBalance: An Automatic Balancing Framework for Training Physics-Informed Neural Networks
AutoBalance: An Automatic Balancing Framework for Training Physics-Informed Neural Networks
Kang An
Chenhao Si
Ming Yan
Shiqian Ma
AI4CE
105
0
0
08 Oct 2025
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
Kotaro Yoshida
Atsushi Nitanda
170
0
0
30 Sep 2025
Visualization and Analysis of the Loss Landscape in Graph Neural Networks
Visualization and Analysis of the Loss Landscape in Graph Neural NetworksInternational Conference on Artificial Neural Networks (ICANN), 2025
Samir Moustafa
Lorenz Kummer
Simon Fetzel
Nils M. Kriege
Wilfried Gansterer
85
0
0
15 Sep 2025
An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures
An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures
Andrzej D. Dobrzycki
Ana M. Bernardos
José Ramón Casar
80
1
0
05 Sep 2025
Globally aware optimization with resurgence
Globally aware optimization with resurgence
Wei Bu
36
0
0
01 Sep 2025
Adaptive Heavy-Tailed Stochastic Gradient Descent
Adaptive Heavy-Tailed Stochastic Gradient Descent
Bodu Gong
Gustavo Enrique Batista
Pierre Lafaye de Micheaux
112
0
0
29 Aug 2025
Algebraic Approach to Ridge-Regularized Mean Squared Error Minimization in Minimal ReLU Neural Network
Algebraic Approach to Ridge-Regularized Mean Squared Error Minimization in Minimal ReLU Neural Network
Ryoya Fukasaku
Y. Kabata
Akifumi Okuno
88
0
0
25 Aug 2025
Understanding Data Influence with Differential Approximation
Understanding Data Influence with Differential Approximation
Haoru Tan
Sitong Wu
Xiuzhe Wu
Wang Wang
Bo Zhao
Zeke Xie
Gui-Song Xia
Xiaojuan Qi
TDI
226
1
0
20 Aug 2025
A Spin Glass Characterization of Neural Networks
A Spin Glass Characterization of Neural Networks
Jun Li
80
0
0
10 Aug 2025
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
Tolga Dimlioglu
A. Choromańska
FedML
238
1
0
27 Jul 2025
Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training
Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training
Yue Hu
Zanxia Cao
Yingchao Liu
ODL
270
1
0
26 Jul 2025
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
YuXin Li
Felix Dangel
Derek Tam
Colin Raffel
161
2
0
24 Jul 2025
Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture
Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture
Erfan Hamdi
Emma Lejeune
OODAI4CE
197
1
0
09 Jul 2025
HiPreNets: High-Precision Neural Networks through Progressive Training
HiPreNets: High-Precision Neural Networks through Progressive Training
Ethan Mulle
W. Kang
Q. Gong
176
0
0
18 Jun 2025
A Study of Hybrid and Evolutionary Metaheuristics for Single Hidden Layer Feedforward Neural Network Architecture
A Study of Hybrid and Evolutionary Metaheuristics for Single Hidden Layer Feedforward Neural Network Architecture
Gautam Siddharth Kashyap
Md. Tabrez Nafis
S. Wazir
220
0
0
17 Jun 2025
Flat Channels to Infinity in Neural Loss Landscapes
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli
Alexander Van Meegen
Berfin Simsek
W. Gerstner
Johanni Brea
253
2
0
17 Jun 2025
Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Sigma Jahan
Mohammad Masudur Rahman
128
0
0
09 Jun 2025
Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation
Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation
Zhan Zhuang
Xiequn Wang
Wei Li
Yulong Zhang
Qiushi Huang
...
Yanbin Wei
Yuhe Nie
Kede Ma
Yu Zhang
Ying Wei
253
0
0
06 Jun 2025
A projection-based framework for gradient-free and parallel learning
A projection-based framework for gradient-free and parallel learning
Andreas Bergmeister
Manish Krishan Lal
Stefanie Jegelka
S. Sra
188
0
0
06 Jun 2025
Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning
Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning
Yongkang Liu
Xingle Xu
Ercong Nie
Zijing Wang
Shi Feng
Daling Wang
Qian Li
Hinrich Schutze
178
0
0
28 May 2025
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong
Jiho Jang
Nojun Kwak
410
0
0
22 May 2025
Block-Biased Mamba for Long-Range Sequence Processing
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu
N. Benjamin Erichson
Mamba
295
2
0
13 May 2025
Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks
Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks
Ibrahim Talha Ersoy
Karoline Wiesner
236
0
0
10 May 2025
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Jianfeng Yao
Jianfeng Yao
250
2
0
05 May 2025
Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods
Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods
Andres Fernandez
Frank Schneider
Maren Mahsereci
Philipp Hennig
314
1
0
20 Apr 2025
SDEIT: Semantic-Driven Electrical Impedance Tomography
SDEIT: Semantic-Driven Electrical Impedance Tomography
Dong Liu
Yuanchao Wu
Bowen Tong
Jiansong Deng
DiffM
202
0
0
05 Apr 2025
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Brianna Chrisman
Lucius Bushnaq
Lee D. Sharkey
258
2
0
31 Mar 2025
Almost Bayesian: The Fractal Dynamics of Stochastic Gradient Descent
Almost Bayesian: The Fractal Dynamics of Stochastic Gradient Descent
Max Hennick
Stijn De Baerdemacker
188
3
0
28 Mar 2025
High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise
High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise
Yuchen Fang
Javad Lavaei
Katya Scheinberg
217
0
0
24 Mar 2025
A Framework for Finding Local Saddle Points in Two-Player Zero-Sum Black-Box Games
A Framework for Finding Local Saddle Points in Two-Player Zero-Sum Black-Box Games
Shubhankar Agarwal
Hamzah I. Khan
Sandeep Chinchali
David Fridovich-Keil
279
1
0
23 Mar 2025
From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs
From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs
Rohan Bhatnagar
Ling Liang
Krish Patel
Haizhao Yang
261
2
0
13 Mar 2025
Hamiltonian Neural Networks for Robust Out-of-Time Credit Scoring
Hamiltonian Neural Networks for Robust Out-of-Time Credit Scoring
Javier Marín
333
0
0
13 Mar 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODLAAML
1.1K
1
0
25 Feb 2025
Verification and Validation for Trustworthy Scientific Machine Learning
Verification and Validation for Trustworthy Scientific Machine Learning
John D. Jakeman
Lorena A. Barba
J. Martins
Thomas O'Leary-Roseberry
AI4CE
412
2
0
21 Feb 2025
Optimal Subspace Inference for the Laplace Approximation of Bayesian Neural Networks
Optimal Subspace Inference for the Laplace Approximation of Bayesian Neural Networks
Josua Faller
Jörg Martin
BDL
320
0
0
04 Feb 2025
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningInternational Conference on Learning Representations (ICLR), 2025
Haque Ishfaq
Guangyuan Wang
Sami Nur Islam
Doina Precup
291
9
0
29 Jan 2025
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise
  SAM
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise SAMAAAI Conference on Artificial Intelligence (AAAI), 2024
Xingyu Lyu
Qianqian Xu
Zhiyong Yang
Shaojie Lyu
Qingming Huang
416
1
0
18 Dec 2024
Causal Invariance Learning via Efficient Optimization of a Nonconvex
  Objective
Causal Invariance Learning via Efficient Optimization of a Nonconvex Objective
Zhenyu Wang
Yifan Hu
Peter Buhlmann
Zijian Guo
370
3
0
16 Dec 2024
Curvature in the Looking-Glass: Optimal Methods to Exploit Curvature of
  Expectation in the Loss Landscape
Curvature in the Looking-Glass: Optimal Methods to Exploit Curvature of Expectation in the Loss Landscape
Jed A. Duersch
Tommie A. Catanach
Alexander Safonov
Jeremy Wendt
306
0
0
25 Nov 2024
Neural Network-based High-index Saddle Dynamics Method for Searching
  Saddle Points and Solution Landscape
Neural Network-based High-index Saddle Dynamics Method for Searching Saddle Points and Solution Landscape
Yuankai Liu
Lei Zhang
Jin Zhao
147
1
0
25 Nov 2024
Don't Be So Positive: Negative Step Sizes in Second-Order Methods
Betty Shea
Mark Schmidt
ODL
218
2
0
18 Nov 2024
Data movement limits to frontier model training
Data movement limits to frontier model training
Ege Erdil
David Schneider-Joseph
330
5
0
02 Nov 2024
CopRA: A Progressive LoRA Training Strategy
CopRA: A Progressive LoRA Training Strategy
Zhan Zhuang
Xiequn Wang
Yulong Zhang
Wei Li
Yu Zhang
Ying Wei
232
1
0
30 Oct 2024
A Mathematical Analysis of Neural Operator Behaviors
A Mathematical Analysis of Neural Operator Behaviors
Vu-Anh Le
Mehmet Dik
AI4CE
160
5
0
28 Oct 2024
Trust-Region Eigenvalue Filtering for Projected Newton
Trust-Region Eigenvalue Filtering for Projected NewtonACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Honglin Chen
Hsueh-Ti Derek Liu
Alec Jacobson
David I. W. Levin
Changxi Zheng
142
4
0
14 Oct 2024
Convex Distillation: Efficient Compression of Deep Networks via Convex
  Optimization
Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization
Prateek Varshney
Mert Pilanci
328
0
0
09 Oct 2024
1234...111213
Next