ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.07042
  4. Cited By
Transformers learn through gradual rank increase
v1v2 (latest)

Transformers learn through gradual rank increase

Neural Information Processing Systems (NeurIPS), 2023
12 June 2023
Enric Boix-Adserà
Etai Littwin
Emmanuel Abbe
Samy Bengio
J. Susskind
ArXiv (abs)PDFHTMLHuggingFace (9 upvotes)

Papers citing "Transformers learn through gradual rank increase"

20 / 20 papers shown
Title
SCALE: Upscaled Continual Learning of Large Language Models
SCALE: Upscaled Continual Learning of Large Language Models
Jin-woo Lee
Junhwa Choi
Bongkyu Hwang
Jinho Choo
Bogun Kim
...
Joonseok Lee
DongYoung Jung
Jaeseon Park
Kyoungwon Park
Suk-hoon Jung
CLLLRM
314
0
0
05 Nov 2025
Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization
Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization
Hancheng Min
Rene Vidal
CLLMLT
76
1
0
28 Aug 2025
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Yixiao Huang
Hanlin Zhu
Tianyu Guo
Jiantao Jiao
Somayeh Sojoudi
Michael I. Jordan
Stuart Russell
Song Mei
LRM
445
4
0
12 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
281
3
0
06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion
Liang Zhang
Bingcong Li
Niao He
184
3
0
03 Jun 2025
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Sameera Ramasinghe
Thalaiyasingam Ajanthan
Gil Avraham
Yan Zuo
Alexander Long
GNN
309
0
0
02 Jun 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
484
4
0
02 May 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
221
2
0
17 Apr 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear TransformersInternational Conference on Learning Representations (ICLR), 2025
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
653
17
0
15 Apr 2025
Spectral Architecture Search for Neural Network Models
Spectral Architecture Search for Neural Network Models
Gianluca Peri
Lorenzo Giambagli
Lorenzo Chicchi
Duccio Fanelli
147
0
0
01 Apr 2025
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
269
13
0
17 Feb 2025
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic TransformationsComputer Vision and Pattern Recognition (CVPR), 2025
Krishna Sri Ipsit Mantri
Carola-Bibiane Schönlieb
Bruno Ribeiro
Chaim Baskin
Moshe Eliasof
351
5
0
09 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
251
19
0
27 Jan 2025
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Geometric Signatures of Compositionality Across a Language Model's LifetimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
537
8
0
02 Oct 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
254
12
0
21 Aug 2024
Reasoning in Large Language Models: A Geometric Perspective
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
180
3
0
02 Jul 2024
Compression of Structured Data with Autoencoders: Provable Benefit of
  Nonlinearities and Depth
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
Kevin Kögler
Aleksandr Shevchenko
Hamed Hassani
Marco Mondelli
MLT
197
1
0
07 Feb 2024
A phase transition between positional and semantic learning in a
  solvable model of dot-product attention
A phase transition between positional and semantic learning in a solvable model of dot-product attentionNeural Information Processing Systems (NeurIPS), 2024
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
202
24
0
06 Feb 2024
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and AttentionInternational Conference on Learning Representations (ICLR), 2023
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
266
45
0
01 Oct 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Saddle-to-Saddle Dynamics in Diagonal Linear NetworksNeural Information Processing Systems (NeurIPS), 2023
Scott Pesme
Nicolas Flammarion
360
45
0
02 Apr 2023
1