All Papers

0 / 0 papers shown

Title

Title
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization Min Gan Guang-yong Chen Yang Yi Lin Yang 56 0 0 03 Nov 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region Shuang Liang Guido Montúfar 155 0 0 29 Sep 2025
Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models Tianxiao Cao Kyohei Atarashi H. Kashima 170 0 0 14 Aug 2025
Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond Jiaxin Deng Qingcheng Zhu Junbiao Pang Linlin Yang Zhongqian Fu Baochang Zhang 81 0 0 01 Aug 2025
Symmetry in Neural Network Parameter Spaces Bo Zhao Robin Walters Rose Yu 261 6 0 16 Jun 2025
Transformative or Conservative? Conservation laws for ResNets and Transformers Sibylle Marcotte Rémi Gribonval Gabriel Peyré 204 3 0 06 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks D. Kunin Giovanni Luca Marchetti F. Chen Dhruva Karkada James B. Simon M. DeWeese Surya Ganguli Nina Miolane 281 3 0 06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation Kai Lion Liang Zhang Bingcong Li Niao He 188 3 0 03 Jun 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models Yilang Zhang Bingcong Li G. Giannakis 494 2 0 24 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models Ziqing Xu Hancheng Min Salma Tarmoun Enrique Mallada Rene Vidal 225 2 0 16 May 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening Liming Liu Zixuan Zhang S. Du T. Zhao 263 1 0 04 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks Ilja Kuzborskij Yasin Abbasi-Yadkori 263 1 0 24 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networksAnnual Conference Computational Learning Theory (COLT), 2025 Sholom Schechtman Nicolas Schreuder 953 0 0 08 Feb 2025
$k$ -SVD with Gradient Descent Yassir Jedra Yassir Jedra 372 0 0 01 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion Binchi Zhang Zaiyi Zheng Zhengzhang Chen Wenlin Yao 494 5 0 01 Feb 2025
Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry Giovanni Luca Marchetti Vahid Shahverdi Stefano Mereta Matthew Trager Kathlén Kohn 296 2 0 31 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention Yedi Zhang Aaditya K. Singh Peter E. Latham Andrew Saxe MLT 255 19 0 27 Jan 2025
Geometry and Optimization of Shallow Polynomial Networks Yossi Arjevani Joan Bruna Joe Kileel Elzbieta Polak Matthew Trager 216 4 0 10 Jan 2025
How Feature Learning Can Improve Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2024 Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 386 32 0 26 Sep 2024
In-depth Analysis of Low-rank Matrix Factorisation in a Federated SettingAAAI Conference on Artificial Intelligence (AAAI), 2024 Constantin Philippenko Kevin Scaman Laurent Massoulié FedML 289 4 0 13 Sep 2024
Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis Kumar Kshitij Patel Samuel Wheeler Pedro H. P. Savarese Gal Vardi Karen Livescu Michael Maire Matthew R. Walter 254 12 0 21 Aug 2024
Masks, Signs, And Learning Rate Rewinding Advait Gadhikar R. Burkholz 204 14 0 29 Feb 2024
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space Mingyang Yi Bohan Wang 253 0 0 24 Jan 2024
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult Yuqing Wang Zhenghao Xu Tuo Zhao Molei Tao 274 16 0 26 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and InitializationInternational Conference on Learning Representations (ICLR), 2023 Nuoya Xiong Lijun Ding Simon S. Du 394 18 0 03 Oct 2023
Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations J. S. Wind 193 2 0 04 Sep 2023
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023 Ruiqi Zhang Spencer Frei Peter L. Bartlett 333 270 0 16 Jun 2023
Neural (Tangent Kernel) CollapseNeural Information Processing Systems (NeurIPS), 2023 Mariia Seleznova Dana Weitzner Raja Giryes Gitta Kutyniok H. Chou 248 14 0 25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and BeyondInternational Conference on Machine Learning (ICML), 2023 Itai Kreisler Mor Shpigel Nacson Daniel Soudry Y. Carmon 179 16 0 22 May 2023
Convergence of Alternating Gradient Descent for Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2023 R. Ward T. Kolda 207 12 0 11 May 2023
On the Stepwise Nature of Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2023 James B. Simon Maksis Knutins Liu Ziyin Daniel Geisz Abraham J. Fetterman Joshua Albrecht SSL 216 38 0 27 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossInternational Conference on Machine Learning (ICML), 2023 Pierre Bréchet Katerina Papagiannouli Jing An Guido Montúfar 338 7 0 06 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single NeuronAnnual Conference Computational Learning Theory (COLT), 2023 Weihang Xu S. Du 259 21 0 20 Feb 2023
How to prepare your task head for finetuningInternational Conference on Learning Representations (ICLR), 2023 Yi Ren Shangmin Guo Wonho Bae Danica J. Sutherland 110 18 0 11 Feb 2023
Implicit Regularization for Group SparsityInternational Conference on Learning Representations (ICLR), 2023 Jiangyuan Li THANH VAN NGUYEN Chinmay Hegde Raymond K. W. Wong 201 11 0 29 Jan 2023
Effects of Data Geometry in Early Deep LearningNeural Information Processing Systems (NeurIPS), 2022 Saket Tiwari George Konidaris 289 7 0 29 Dec 2022
Improved Convergence Guarantees for Shallow Neural Networks A. Razborov ODL 181 1 0 05 Dec 2022
Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022 Lénaïc Chizat Maria Colombo Xavier Fernández-Real Alessio Figalli 162 21 0 29 Nov 2022
Mechanistic Mode ConnectivityInternational Conference on Machine Learning (ICML), 2022 Ekdeep Singh Lubana Eric J. Bigelow Robert P. Dick David M. Krueger Hidenori Tanaka 246 56 0 15 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022 Bo Zhao I. Ganev Robin Walters Rose Yu Nima Dehmamy 304 26 0 31 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language ModelsInternational Conference on Machine Learning (ICML), 2022 Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 296 67 0 25 Oct 2022
Surgical Fine-Tuning Improves Adaptation to Distribution ShiftsInternational Conference on Learning Representations (ICLR), 2022 Yoonho Lee Annie S. Chen Fahim Tajwar Ananya Kumar Huaxiu Yao Abigail Z. Jacobs Chelsea Finn OOD 250 250 0 20 Oct 2022
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature NoiseInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Haotian Ye James Zou Linjun Zhang OOD 303 27 0 20 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks A. K. Akash Sixu Li Nicolas García Trillos 178 15 0 13 Oct 2022
Boosting Adversarial Robustness From The Perspective of Effective Margin RegularizationBritish Machine Vision Conference (BMVC), 2022 Ziquan Liu Antoni B. Chan AAML 138 6 0 11 Oct 2022
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear FunctionsInternational Conference on Learning Representations (ICLR), 2022 Arthur Jacot 310 36 0 29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU NeuronsNeural Networks (NN), 2022 Sangmin Lee Byeongsu Sim Jong Chul Ye MLT 303 6 0 27 Sep 2022
A Validation Approach to Over-parameterized Matrix and Image Recovery Lijun Ding Zhen Qin Liwei Jiang Jinxin Zhou Zhihui Zhu 332 15 0 21 Sep 2022
Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)Neural Information Processing Systems (NeurIPS), 2022 Zhenyu Zhu Fanghui Liu Grigorios G. Chrysos Volkan Cevher 258 23 0 15 Sep 2022
On the Implicit Bias in Deep-Learning AlgorithmsCommunications of the ACM (CACM), 2022 Gal Vardi FedML AI4CE 292 107 0 26 Aug 2022

Title

A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

03 Nov 2025

Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

Shuang Liang

Guido Montúfar

155

29 Sep 2025

Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models

Tianxiao Cao

Kyohei Atarashi

H. Kashima

170

14 Aug 2025

Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond

01 Aug 2025

Symmetry in Neural Network Parameter Spaces

Bo Zhao

Robin Walters

Rose Yu

261

16 Jun 2025

Transformative or Conservative? Conservation laws for ResNets and Transformers

Sibylle Marcotte

Rémi Gribonval

Gabriel Peyré

204

06 Jun 2025

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

D. Kunin

Giovanni Luca Marchetti

281

06 Jun 2025

PoLAR: Polar-Decomposed Low-Rank Adapter Representation

188

03 Jun 2025

RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models

Yilang Zhang

Bingcong Li

G. Giannakis

494

24 May 2025

A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models

225

16 May 2025

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

263

04 Mar 2025

Low-rank bias, weight decay, and model merging in neural networks

Ilja Kuzborskij

Yasin Abbasi-Yadkori

263

24 Feb 2025

The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networksAnnual Conference Computational Learning Theory (COLT), 2025

Sholom Schechtman

Nicolas Schreuder

953

08 Feb 2025

k

-SVD with Gradient Descent

Yassir Jedra

372

01 Feb 2025

Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

494

01 Feb 2025

Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry

Giovanni Luca Marchetti

296

31 Jan 2025

Training Dynamics of In-Context Learning in Linear Attention

255

27 Jan 2025

Geometry and Optimization of Shallow Polynomial Networks

216

10 Jan 2025

How Feature Learning Can Improve Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2024

Blake Bordelon

Alexander B. Atanasov

Cengiz Pehlevan

386

26 Sep 2024

In-depth Analysis of Low-rank Matrix Factorisation in a Federated SettingAAAI Conference on Artificial Intelligence (AAAI), 2024

Constantin Philippenko

Kevin Scaman

Laurent Massoulié

FedML

289

13 Sep 2024

Approaching Deep Learning through the Spectral Dynamics of Weights

254

21 Aug 2024

Masks, Signs, And Learning Rate Rewinding

Advait Gadhikar

R. Burkholz

204

29 Feb 2024

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

Mingyang Yi

Bohan Wang

253

24 Jan 2024

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

274

26 Oct 2023

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and InitializationInternational Conference on Learning Representations (ICLR), 2023

Nuoya Xiong

Lijun Ding

Simon S. Du

394

03 Oct 2023

Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations

J. S. Wind

193

04 Sep 2023

Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023

Ruiqi Zhang

Spencer Frei

Peter L. Bartlett

333

270

16 Jun 2023

Neural (Tangent Kernel) CollapseNeural Information Processing Systems (NeurIPS), 2023

248

25 May 2023

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and BeyondInternational Conference on Machine Learning (ICML), 2023

179

22 May 2023

Convergence of Alternating Gradient Descent for Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2023

R. Ward

T. Kolda

207

11 May 2023

On the Stepwise Nature of Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2023

216

27 Mar 2023

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossInternational Conference on Machine Learning (ICML), 2023

Pierre Bréchet

Katerina Papagiannouli

Jing An

Guido Montúfar

338

06 Mar 2023

Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single NeuronAnnual Conference Computational Learning Theory (COLT), 2023

Weihang Xu

S. Du

259

20 Feb 2023

How to prepare your task head for finetuningInternational Conference on Learning Representations (ICLR), 2023

Yi Ren

Shangmin Guo

Wonho Bae

Danica J. Sutherland

110

11 Feb 2023

Implicit Regularization for Group SparsityInternational Conference on Learning Representations (ICLR), 2023

201

29 Jan 2023

Effects of Data Geometry in Early Deep LearningNeural Information Processing Systems (NeurIPS), 2022

Saket Tiwari

George Konidaris

289

29 Dec 2022

Improved Convergence Guarantees for Shallow Neural Networks

A. Razborov

ODL

181

05 Dec 2022

Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022

Lénaïc Chizat

Maria Colombo

Xavier Fernández-Real

Alessio Figalli

162

29 Nov 2022

Mechanistic Mode ConnectivityInternational Conference on Machine Learning (ICML), 2022

246

15 Nov 2022

Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022

304

31 Oct 2022

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language ModelsInternational Conference on Machine Learning (ICML), 2022

296

25 Oct 2022

Surgical Fine-Tuning Improves Adaptation to Distribution ShiftsInternational Conference on Learning Representations (ICLR), 2022

250

20 Oct 2022

Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature NoiseInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022

303

20 Oct 2022

Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks

A. K. Akash

Sixu Li

Nicolas García Trillos

178

13 Oct 2022

Boosting Adversarial Robustness From The Perspective of Effective Margin RegularizationBritish Machine Vision Conference (BMVC), 2022

Ziquan Liu

Antoni B. Chan

AAML

138

11 Oct 2022

Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear FunctionsInternational Conference on Learning Representations (ICLR), 2022

Arthur Jacot

310

29 Sep 2022

Magnitude and Angle Dynamics in Training Single ReLU NeuronsNeural Networks (NN), 2022

303

27 Sep 2022

A Validation Approach to Over-parameterized Matrix and Image Recovery

332

21 Sep 2022

Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)Neural Information Processing Systems (NeurIPS), 2022

258

15 Sep 2022

On the Implicit Bias in Deep-Learning AlgorithmsCommunications of the ACM (CACM), 2022

Gal Vardi

FedML AI4CE

292

107

26 Aug 2022

Title
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization Min Gan Guang-yong Chen Yang Yi Lin Yang 56 0 0 03 Nov 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region Shuang Liang Guido Montúfar 155 0 0 29 Sep 2025
Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models Tianxiao Cao Kyohei Atarashi H. Kashima 170 0 0 14 Aug 2025
Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond Jiaxin Deng Qingcheng Zhu Junbiao Pang Linlin Yang Zhongqian Fu Baochang Zhang 81 0 0 01 Aug 2025
Symmetry in Neural Network Parameter Spaces Bo Zhao Robin Walters Rose Yu 261 6 0 16 Jun 2025
Transformative or Conservative? Conservation laws for ResNets and Transformers Sibylle Marcotte Rémi Gribonval Gabriel Peyré 204 3 0 06 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks D. Kunin Giovanni Luca Marchetti F. Chen Dhruva Karkada James B. Simon M. DeWeese Surya Ganguli Nina Miolane 281 3 0 06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation Kai Lion Liang Zhang Bingcong Li Niao He 188 3 0 03 Jun 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models Yilang Zhang Bingcong Li G. Giannakis 494 2 0 24 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models Ziqing Xu Hancheng Min Salma Tarmoun Enrique Mallada Rene Vidal 225 2 0 16 May 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening Liming Liu Zixuan Zhang S. Du T. Zhao 263 1 0 04 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks Ilja Kuzborskij Yasin Abbasi-Yadkori 263 1 0 24 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networksAnnual Conference Computational Learning Theory (COLT), 2025 Sholom Schechtman Nicolas Schreuder 953 0 0 08 Feb 2025
$k$ -SVD with Gradient Descent Yassir Jedra Yassir Jedra 372 0 0 01 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion Binchi Zhang Zaiyi Zheng Zhengzhang Chen Wenlin Yao 494 5 0 01 Feb 2025
Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry Giovanni Luca Marchetti Vahid Shahverdi Stefano Mereta Matthew Trager Kathlén Kohn 296 2 0 31 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention Yedi Zhang Aaditya K. Singh Peter E. Latham Andrew Saxe MLT 255 19 0 27 Jan 2025
Geometry and Optimization of Shallow Polynomial Networks Yossi Arjevani Joan Bruna Joe Kileel Elzbieta Polak Matthew Trager 216 4 0 10 Jan 2025
How Feature Learning Can Improve Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2024 Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 386 32 0 26 Sep 2024
In-depth Analysis of Low-rank Matrix Factorisation in a Federated SettingAAAI Conference on Artificial Intelligence (AAAI), 2024 Constantin Philippenko Kevin Scaman Laurent Massoulié FedML 289 4 0 13 Sep 2024
Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis Kumar Kshitij Patel Samuel Wheeler Pedro H. P. Savarese Gal Vardi Karen Livescu Michael Maire Matthew R. Walter 254 12 0 21 Aug 2024
Masks, Signs, And Learning Rate Rewinding Advait Gadhikar R. Burkholz 204 14 0 29 Feb 2024
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space Mingyang Yi Bohan Wang 253 0 0 24 Jan 2024
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult Yuqing Wang Zhenghao Xu Tuo Zhao Molei Tao 274 16 0 26 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and InitializationInternational Conference on Learning Representations (ICLR), 2023 Nuoya Xiong Lijun Ding Simon S. Du 394 18 0 03 Oct 2023
Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations J. S. Wind 193 2 0 04 Sep 2023
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023 Ruiqi Zhang Spencer Frei Peter L. Bartlett 333 270 0 16 Jun 2023
Neural (Tangent Kernel) CollapseNeural Information Processing Systems (NeurIPS), 2023 Mariia Seleznova Dana Weitzner Raja Giryes Gitta Kutyniok H. Chou 248 14 0 25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and BeyondInternational Conference on Machine Learning (ICML), 2023 Itai Kreisler Mor Shpigel Nacson Daniel Soudry Y. Carmon 179 16 0 22 May 2023
Convergence of Alternating Gradient Descent for Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2023 R. Ward T. Kolda 207 12 0 11 May 2023
On the Stepwise Nature of Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2023 James B. Simon Maksis Knutins Liu Ziyin Daniel Geisz Abraham J. Fetterman Joshua Albrecht SSL 216 38 0 27 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossInternational Conference on Machine Learning (ICML), 2023 Pierre Bréchet Katerina Papagiannouli Jing An Guido Montúfar 338 7 0 06 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single NeuronAnnual Conference Computational Learning Theory (COLT), 2023 Weihang Xu S. Du 259 21 0 20 Feb 2023
How to prepare your task head for finetuningInternational Conference on Learning Representations (ICLR), 2023 Yi Ren Shangmin Guo Wonho Bae Danica J. Sutherland 110 18 0 11 Feb 2023
Implicit Regularization for Group SparsityInternational Conference on Learning Representations (ICLR), 2023 Jiangyuan Li THANH VAN NGUYEN Chinmay Hegde Raymond K. W. Wong 201 11 0 29 Jan 2023
Effects of Data Geometry in Early Deep LearningNeural Information Processing Systems (NeurIPS), 2022 Saket Tiwari George Konidaris 289 7 0 29 Dec 2022
Improved Convergence Guarantees for Shallow Neural Networks A. Razborov ODL 181 1 0 05 Dec 2022
Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022 Lénaïc Chizat Maria Colombo Xavier Fernández-Real Alessio Figalli 162 21 0 29 Nov 2022
Mechanistic Mode ConnectivityInternational Conference on Machine Learning (ICML), 2022 Ekdeep Singh Lubana Eric J. Bigelow Robert P. Dick David M. Krueger Hidenori Tanaka 246 56 0 15 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022 Bo Zhao I. Ganev Robin Walters Rose Yu Nima Dehmamy 304 26 0 31 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language ModelsInternational Conference on Machine Learning (ICML), 2022 Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 296 67 0 25 Oct 2022
Surgical Fine-Tuning Improves Adaptation to Distribution ShiftsInternational Conference on Learning Representations (ICLR), 2022 Yoonho Lee Annie S. Chen Fahim Tajwar Ananya Kumar Huaxiu Yao Abigail Z. Jacobs Chelsea Finn OOD 250 250 0 20 Oct 2022
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature NoiseInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Haotian Ye James Zou Linjun Zhang OOD 303 27 0 20 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks A. K. Akash Sixu Li Nicolas García Trillos 178 15 0 13 Oct 2022
Boosting Adversarial Robustness From The Perspective of Effective Margin RegularizationBritish Machine Vision Conference (BMVC), 2022 Ziquan Liu Antoni B. Chan AAML 138 6 0 11 Oct 2022
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear FunctionsInternational Conference on Learning Representations (ICLR), 2022 Arthur Jacot 310 36 0 29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU NeuronsNeural Networks (NN), 2022 Sangmin Lee Byeongsu Sim Jong Chul Ye MLT 303 6 0 27 Sep 2022
A Validation Approach to Over-parameterized Matrix and Image Recovery Lijun Ding Zhen Qin Liwei Jiang Jinxin Zhou Zhihui Zhu 332 15 0 21 Sep 2022
Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)Neural Information Processing Systems (NeurIPS), 2022 Zhenyu Zhu Fanghui Liu Grigorios G. Chrysos Volkan Cevher 258 23 0 15 Sep 2022
On the Implicit Bias in Deep-Learning AlgorithmsCommunications of the ACM (CACM), 2022 Gal Vardi FedML AI4CE 292 107 0 26 Aug 2022

Title

A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

03 Nov 2025

Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

Shuang Liang

Guido Montúfar

155

29 Sep 2025