Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Neural Information Processing Systems (NeurIPS), 2015

8 June 2015

Papers citing "Path-SGD: Path-Normalized Optimization in Deep Neural Networks"

50 / 195 papers shown

Isotropic Curvature Model for Understanding Deep Learning Optimization: Is Gradient Orthogonalization Optimal?

Weijie Su

145

01 Nov 2025

$Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias$

Closed-form

\ell_r

norm scaling with data for overparameterized linear regression and diagonal linear networks under

\ell_p

bias

Shuofeng Zhang

A. Louis

212

25 Sep 2025

Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond

150

01 Aug 2025

Symmetry in Neural Network Parameter Spaces

Bo Zhao

Robin Walters

Rose Yu

370

16 Jun 2025

Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings

Evan Markou

Thalaiyasingam Ajanthan

Stephen Gould

314

10 Jun 2025

Improving Learning to Optimize Using Parameter Symmetries

317

21 Apr 2025

The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE

Andrei Chernov

Oleg Novitskij

382

24 Feb 2025

Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

639

01 Feb 2025

An In-depth Investigation of Sparse Rate Reduction in Transformer-like ModelsNeural Information Processing Systems (NeurIPS), 2024

Yunzhe Hu

Difan Zou

Dong Xu

383

26 Nov 2024

Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant ProblemsNeural Information Processing Systems (NeurIPS), 2024

Bingcong Li

Liang Zhang

Niao He

284

18 Oct 2024

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks

364

02 Oct 2024

Monomial Matrix Group Equivariant Neural Functional NetworksNeural Information Processing Systems (NeurIPS), 2024

Hoang V. Tran

Thieu N. Vo

Tho H. Tran

An T. Nguyen

Tan M. Nguyen

474

18 Sep 2024

Application of Langevin Dynamics to Advance the Quantum Natural Gradient Optimization Algorithm

389

03 Sep 2024

Quantum-secure multiparty deep learning

261

10 Aug 2024

Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?IEEE Access (IEEE Access), 2024

Mohamed Hassan

Aleksandar Vakanski

Min Xian

AAML MedIm

387

07 Aug 2024

Scale Equivariant Graph Metanetworks

Ioannis Kalogeropoulos

Giorgos Bouritsas

Yannis Panagakis

392

15 Jun 2024

ReLUs Are Sufficient for Learning Implicit Neural Representations

Joseph Shenouda

Yamin Zhou

Robert D. Nowak

246

04 Jun 2024

Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization

301

03 Jun 2024

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

471

30 May 2024

Scalable Optimization in the Modular NormNeural Information Processing Systems (NeurIPS), 2024

Yang Liu

244

23 May 2024

Hidden Synergy:

L_1

Weight Normalization and 1-Path-Norm Regularization

Aditya Biswas

262

29 Apr 2024

On the Benefits of Over-parameterization for Out-of-Distribution Generalization

Yifan Hao

Yong Lin

Difan Zou

Tong Zhang

OODD OOD

245

26 Mar 2024

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

Xiangyu Yin

Wenjie Ruan

AAML

177

26 Mar 2024

Understanding the Double Descent Phenomenon in Deep Learning

Marc Lafon

Alexandre Thomas

350

15 Mar 2024

Level Set Teleportation: An Optimization Perspective

Aaron Mishkin

A. Bietti

Robert Mansel Gower

308

05 Mar 2024

Fine-tuning with Very Large Dropout

Jianyu Zhang

Léon Bottou

389

01 Mar 2024

Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures

Valentina Zantedeschi

320

19 Feb 2024

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

Hongwu Peng

335

05 Feb 2024

Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

256

04 Feb 2024

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Yifan Hao

Tong Zhang

AAML

507

19 Jan 2024

Applying statistical learning theory to deep learningJournal of Statistical Mechanics: Theory and Experiment (J. Stat. Mech.), 2023

248

26 Nov 2023

Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle

231

26 Oct 2023

A Symmetry-Aware Exploration of Bayesian Neural Network PosteriorsInternational Conference on Learning Representations (ICLR), 2023

289

12 Oct 2023

Deep Neural Networks Tend To Extrapolate PredictablyInternational Conference on Learning Representations (ICLR), 2023

Katie Kang

Amrith Rajagopal Setlur

Claire Tomlin

Sergey Levine

213

02 Oct 2023

Fantastic Generalization Measures are Nowhere to be FoundInternational Conference on Learning Representations (ICLR), 2023

368

24 Sep 2023

Weighted variation spaces and approximation by shallow ReLU networksApplied and Computational Harmonic Analysis (ACHA), 2023

247

28 Jul 2023

Quantum Machine Learning on Near-Term Quantum Devices: Current State of Supervised and Unsupervised Techniques for Real-World ApplicationsPhysical Review Applied (Phys. Rev. Appl.), 2023

Yaswitha Gujju

A. Matsuo

Raymond H. Putra

451

03 Jul 2023

Nonparametric regression using over-parameterized shallow ReLU neural networksJournal of machine learning research (JMLR), 2023

Yunfei Yang

Ding-Xuan Zhou

347

14 Jun 2023

Hidden symmetries of ReLU networksInternational Conference on Machine Learning (ICML), 2023

J. E. Grigsby

Kathryn A. Lindsey

David Rolnick

266

09 Jun 2023

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural NetworksInternational Conference on Machine Learning (ICML), 2023

Atli Kosson

Bettina Messmer

Martin Jaggi

456

26 May 2023

Improving Convergence and Generalization Using Parameter SymmetriesInternational Conference on Learning Representations (ICLR), 2023

393

22 May 2023

Exploring the Complexity of Deep Neural Networks through Functional EquivalenceInternational Conference on Machine Learning (ICML), 2023

Guohao Shen

372

19 May 2023

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Jing An

Jianfeng Lu

197

18 Apr 2023

Solving Regularized Exp, Cosh and Sinh Regression Problems

Zhihang Li

Zhao Song

Wanrong Zhu

199

28 Mar 2023

Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural ObfuscationUSENIX Security Symposium (USENIX Security), 2023

249

17 Mar 2023

The Geometry of Neural Nets' Parameter Spaces Under ReparametrizationNeural Information Processing Systems (NeurIPS), 2023

Agustinus Kristiadi

Felix Dangel

Philipp Hennig

236

14 Feb 2023

Equivariant Architectures for Learning in Deep Weight SpacesInternational Conference on Machine Learning (ICML), 2023

352

30 Jan 2023

Quantifying the Impact of Label Noise on Federated Learning

Shuqi Ke

Chao Huang

Xin Liu

FedML

343

15 Nov 2022

Instance-Dependent Generalization Bounds via Optimal TransportJournal of machine learning research (JMLR), 2022

500

02 Nov 2022

Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022

366

31 Oct 2022