ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.17813
  4. Cited By
A Spectral Condition for Feature Learning
v1v2 (latest)

A Spectral Condition for Feature Learning

26 October 2023
Greg Yang
James B. Simon
Jeremy Bernstein
ArXiv (abs)PDFHTMLGithub

Papers citing "A Spectral Condition for Feature Learning"

41 / 41 papers shown
Controlling changes to attention logits
Controlling changes to attention logits
Ben Anson
Laurence Aitchison
226
0
0
26 Nov 2025
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Zhiqi Bu
AI4CE
173
0
0
07 Nov 2025
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
Atli Kosson
Jeremy Welborn
Yang Liu
Martin Jaggi
Xi Chen
236
5
0
21 Oct 2025
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Zhiyuan Fan
Yifeng Liu
Qingyue Zhao
Angela Yuan
Quanquan Gu
149
3
0
17 Oct 2025
AdaPM: a Partial Momentum Algorithm for LLM Training
AdaPM: a Partial Momentum Algorithm for LLM Training
Yimu Zhang
Yuanshi Liu
Cong Fang
225
2
0
10 Oct 2025
POME: Post Optimization Model Edit via Muon-style Projection
POME: Post Optimization Model Edit via Muon-style Projection
Yong Liu
Di Fu
Yang Luo
Zirui Zhu
Minhao Cheng
Cho-Jui Hsieh
Yang You
131
2
0
08 Oct 2025
Spectral Alignment as Predictor of Loss Explosion in Neural Network Training
Spectral Alignment as Predictor of Loss Explosion in Neural Network Training
Haiquan Qiu
You Wu
Yingjie Tan
Yaqing Wang
Quanming Yao
142
0
0
05 Oct 2025
Optimal Scaling Needs Optimal Norm
Optimal Scaling Needs Optimal Norm
Oleg Filatov
Jiangtao Wang
J. Ebert
Stefan Kesselheim
241
3
0
04 Oct 2025
Muon Outperforms Adam in Tail-End Associative Memory Learning
Muon Outperforms Adam in Tail-End Associative Memory Learning
Shuche Wang
Fengzhuo Zhang
Jiaxiang Li
Cunxiao Du
C. Du
Tianyu Pang
Zhuoran Yang
Mingyi Hong
Vincent Y. F. Tan
219
14
0
30 Sep 2025
Conda: Column-Normalized Adam for Training Large Language Models Faster
Conda: Column-Normalized Adam for Training Large Language Models Faster
Junjie Wang
Pan Zhou
Yiming Dong
Huan Li
Jia Li
Xun Zhou
Qicheng Lao
Cong Fang
Zhouchen Lin
AI4CE
294
2
0
29 Sep 2025
Beyond Outliers: A Study of Optimizers Under Quantization
Beyond Outliers: A Study of Optimizers Under Quantization
Georgios Vlassis
Saleh Ashkboos
Alexandra Volkova
Torsten Hoefler
Dan Alistarh
MQ
312
4
0
27 Sep 2025
Understanding Post-Training Structural Changes in Large Language Models
Understanding Post-Training Structural Changes in Large Language Models
Xinyu He
Xianghui Cao
253
0
0
22 Sep 2025
Customizing the Inductive Biases of Softmax Attention using Structured Matrices
Customizing the Inductive Biases of Softmax Attention using Structured Matrices
Yilun Kuang
Noah Amsel
Sanae Lotfi
Shikai Qiu
Andres Potapczynski
Andrew Gordon Wilson
177
0
0
09 Sep 2025
$μ$-Parametrization for Mixture of Experts
μμμ-Parametrization for Mixture of Experts
Jan Małaśnicki
Kamil Ciebiera
Mateusz Boruń
Maciej Pióro
Jan Ludziejewski
...
Michał Krutul
Sebastian Jaszczur
Marek Cygan
Kamil Adamczewski
Jakub Krajewski
MoE
269
0
0
13 Aug 2025
Knowing When to Quit: Probabilistic Early Exits for Speech Separation
Knowing When to Quit: Probabilistic Early Exits for Speech Separation
Kenny Falkær Olsen
Mads Østergaard
Karl Ulbæk
S. F. V. Nielsen
Rasmus Malik Høegh Lindrup
Bjørn Sand Jensen
Morten Mørup
UQCV
370
1
0
13 Jul 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
516
8
0
08 Jun 2025
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Sameera Ramasinghe
Thalaiyasingam Ajanthan
Gil Avraham
Yan Zuo
Alexander Long
GNN
471
2
0
02 Jun 2025
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
Yehonathan Refael
Guy Smorodinsky
Tom Tirer
Ofir Lindenbaum
249
11
0
30 May 2025
The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm
The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm
Noah Amsel
David Persson
Christopher Musco
Robert Gower
331
38
0
22 May 2025
ASGO: Adaptive Structured Gradient Optimization
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Boyao Wang
Shiqian Ma
Shiqian Ma
Tong Zhang
Tong Zhang
ODL
558
38
0
26 Mar 2025
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization
Global Convergence and Rich Feature Learning in LLL-Layer Infinite-Width Neural Networks under μμμP Parametrization
Zixiang Chen
Greg Yang
Qingyue Zhao
Q. Gu
MLT
307
3
0
12 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
476
4
0
26 Feb 2025
Function-Space Learning Rates
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
536
3
0
24 Feb 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
427
8
0
10 Jan 2025
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-TuningInternational Conference on Learning Representations (ICLR), 2024
Yehonathan Refael
Jonathan Svirsky
Boris Shustin
Wasim Huleihel
Ofir Lindenbaum
365
14
0
31 Dec 2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Analyzing & Reducing the Need for Learning Rate Warmup in GPT TrainingNeural Information Processing Systems (NeurIPS), 2024
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
338
16
0
31 Oct 2024
Modular Duality in Deep Learning
Modular Duality in Deep Learning
Jeremy Bernstein
Laker Newhouse
212
41
0
28 Oct 2024
Plastic Learning with Deep Fourier Features
Plastic Learning with Deep Fourier FeaturesInternational Conference on Learning Representations (ICLR), 2024
Alex Lewandowski
Dale Schuurmans
Marlos C. Machado
CLL
327
13
0
27 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
The Optimization Landscape of SGD Across the Feature Learning StrengthInternational Conference on Learning Representations (ICLR), 2024
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
Cengiz Pehlevan
496
12
0
06 Oct 2024
Searching for Efficient Linear Layers over a Continuous Space of
  Structured Matrices
Searching for Efficient Linear Layers over a Continuous Space of Structured MatricesNeural Information Processing Systems (NeurIPS), 2024
Andres Potapczynski
Shikai Qiu
Marc Finzi
Christopher Ferri
Zixi Chen
Micah Goldblum
Bayan Bruss
Christopher De Sa
Andrew Gordon Wilson
297
8
0
03 Oct 2024
Old Optimizer, New Norm: An Anthology
Old Optimizer, New Norm: An Anthology
Jeremy Bernstein
Laker Newhouse
ODL
394
96
0
30 Sep 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
410
20
0
24 Jul 2024
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu
Andres Potapczynski
Marc Finzi
Micah Goldblum
Andrew Gordon Wilson
311
24
0
10 Jun 2024
Get rich quick: exact solutions reveal how unbalanced initializations
  promote rapid feature learning
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
388
34
0
10 Jun 2024
Recurrent neural networks: vanishing and exploding gradients are not the
  end of the story
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
Nicolas Zucchet
Antonio Orvieto
ODLAAML
362
58
0
31 May 2024
Infinite Limits of Multi-head Transformer Dynamics
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
Cengiz Pehlevan
AI4CE
431
32
0
24 May 2024
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Nolan Dey
Shane Bergsma
Joel Hestness
391
10
0
24 May 2024
Scalable Optimization in the Modular Norm
Scalable Optimization in the Modular NormNeural Information Processing Systems (NeurIPS), 2024
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
320
42
0
23 May 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zinan Lin
A. Anandkumar
Yuandong Tian
564
416
0
06 Mar 2024
Spike No More: Stabilizing the Pre-training of Large Language Models
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
535
40
0
28 Dec 2023
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networksNeural Information Processing Systems (NeurIPS), 2023
Lénaic Chizat
Praneeth Netrapalli
527
11
0
30 Nov 2023
1
Page 1 of 1