v1v2 (latest)

Transformers learn through gradual rank increase

Neural Information Processing Systems (NeurIPS), 2023

12 June 2023

ArXiv (abs)PDF HTML HuggingFace (9 upvotes)

Papers citing "Transformers learn through gradual rank increase"

20 / 20 papers shown

Title
SCALE: Upscaled Continual Learning of Large Language Models Jin-woo Lee Junhwa Choi Bongkyu Hwang Jinho Choo Bogun Kim ... Joonseok Lee DongYoung Jung Jaeseon Park Kyoungwon Park Suk-hoon Jung CLL LRM 314 0 0 05 Nov 2025
Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization Hancheng Min Rene Vidal CLL MLT 76 1 0 28 Aug 2025
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Yixiao Huang Hanlin Zhu Tianyu Guo Jiantao Jiao Somayeh Sojoudi Michael I. Jordan Stuart Russell Song Mei LRM 445 4 0 12 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks D. Kunin Giovanni Luca Marchetti F. Chen Dhruva Karkada James B. Simon M. DeWeese Surya Ganguli Nina Miolane 281 3 0 06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation Kai Lion Liang Zhang Bingcong Li Niao He 184 3 0 03 Jun 2025
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism Sameera Ramasinghe Thalaiyasingam Ajanthan Gil Avraham Yan Zuo Alexander Long GNN 309 0 0 02 Jun 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias Ruiquan Huang Yingbin Liang Jing Yang 484 4 0 02 May 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers Nischal Mainali Lucas Teixeira 221 2 0 17 Apr 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear TransformersInternational Conference on Learning Representations (ICLR), 2025 Hongkang Li Yihua Zhang Shuai Zhang Ming Wang Sijia Liu Pin-Yu Chen MoMe 653 17 0 15 Apr 2025
Spectral Architecture Search for Neural Network Models Gianluca Peri Lorenzo Giambagli Lorenzo Chicchi Duccio Fanelli 147 0 0 01 Apr 2025
A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024 Riccardo Rende Federica Gerace Alessandro Laio Sebastian Goldt 269 13 0 17 Feb 2025
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic TransformationsComputer Vision and Pattern Recognition (CVPR), 2025 Krishna Sri Ipsit Mantri Carola-Bibiane Schönlieb Bruno Ribeiro Chaim Baskin Moshe Eliasof 351 5 0 09 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention Yedi Zhang Aaditya K. Singh Peter E. Latham Andrew Saxe MLT 251 19 0 27 Jan 2025
Geometric Signatures of Compositionality Across a Language Model's LifetimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Jin Hwa Lee Thomas Jiralerspong Lei Yu Yoshua Bengio Emily Cheng CoGe 537 8 0 02 Oct 2024
Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis Kumar Kshitij Patel Samuel Wheeler Pedro H. P. Savarese Gal Vardi Karen Livescu Michael Maire Matthew R. Walter 254 12 0 21 Aug 2024
Reasoning in Large Language Models: A Geometric Perspective Romain Cosentino Sarath Shekkizhar LRM 180 3 0 02 Jul 2024
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth Kevin Kögler Aleksandr Shevchenko Hamed Hassani Marco Mondelli MLT 197 1 0 07 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attentionNeural Information Processing Systems (NeurIPS), 2024 Hugo Cui Freya Behrens Florent Krzakala Lenka Zdeborová MLT 202 24 0 06 Feb 2024
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and AttentionInternational Conference on Learning Representations (ICLR), 2023 Yuandong Tian Yiping Wang Zhenyu Zhang Beidi Chen Simon Shaolei Du 266 45 0 01 Oct 2023
Saddle-to-Saddle Dynamics in Diagonal Linear NetworksNeural Information Processing Systems (NeurIPS), 2023 Scott Pesme Nicolas Flammarion 360 45 0 02 Apr 2023