Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.09635
Cited By
v1
v2 (latest)
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
International Conference on Machine Learning (ICML), 2024
14 March 2024
Akhil Kedia
Mohd Abbas Zaidi
Sushil Khyalia
Jungho Jung
Harshith Goka
Haejun Lee
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (10★)
Papers citing
"Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"
4 / 4 papers shown
Normalization in Attention Dynamics
Nikita Karagodin
Shu Ge
Yury Polyanskiy
Philippe Rigollet
274
3
0
24 Oct 2025
Stability of Transformers under Layer Normalization
Kelvin Kan
Xingjian Li
Benjamin J. Zhang
Tuhin Sahai
Stanley Osher
Krishna Kumar
Markos A. Katsoulakis
178
3
0
10 Oct 2025
Short-Range Dependency Effects on Transformer Instability and a Decomposed Attention Solution
Suvadeep Hajra
323
1
0
21 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
702
38
0
02 May 2025
1
Page 1 of 1