ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.03126
  4. Cited By
Signal Propagation in Transformers: Theoretical Perspectives and the
  Role of Rank Collapse

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

7 June 2022
Lorenzo Noci
Sotiris Anagnostidis
Luca Biggio
Antonio Orvieto
Sidak Pal Singh
Aurélien Lucchi
ArXivPDFHTML

Papers citing "Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse"

23 / 23 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
39
0
0
02 May 2025
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
114
0
0
26 Apr 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
46
0
0
06 Mar 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
87
0
0
27 Feb 2025
The Geometry of Tokens in Internal Representations of Large Language Models
The Geometry of Tokens in Internal Representations of Large Language Models
Karthik Viswanathan
Yuri Gardinazzi
Giada Panerai
Alberto Cazzaniga
Matteo Biagetti
AIFin
88
4
0
17 Jan 2025
Activating Self-Attention for Multi-Scene Absolute Pose Regression
Activating Self-Attention for Multi-Scene Absolute Pose Regression
Miso Lee
Jihwan Kim
Jae-Pil Heo
ViT
29
0
0
03 Nov 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
33
6
0
14 Oct 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
M. Zeilinger
Carmen Amo Alonso
33
0
0
14 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Xinhao Yao
Hongjin Qian
Xiaolin Hu
Gengze Xu
Wei Liu
Jian Luan
B. Wang
Y. Liu
48
0
0
03 Oct 2024
Sampling Foundational Transformer: A Theoretical Perspective
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
42
0
0
11 Aug 2024
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton
  Stepsizes
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
Antonio Orvieto
Lin Xiao
37
2
0
05 Jul 2024
Reasoning in Large Language Models: A Geometric Perspective
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
44
2
0
02 Jul 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
Infinite Limits of Multi-head Transformer Dynamics
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
C. Pehlevan
AI4CE
42
9
0
24 May 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent
  on Language Models
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
31
26
0
29 Feb 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
Graph Convolutions Enrich the Self-Attention in Transformers!
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
25
4
0
07 Dec 2023
Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural
  Networks
Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks
Andreas Roth
Thomas Liebig
37
11
0
31 Aug 2023
How to Protect Copyright Data in Optimization of Large Language Models?
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao-quan Song
Chiwun Yang
32
29
0
23 Aug 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurélien Lucchi
Thomas Hofmann
34
53
0
25 May 2023
Formal Mathematics Statement Curriculum Learning
Formal Mathematics Statement Curriculum Learning
Stanislas Polu
Jesse Michael Han
Kunhao Zheng
Mantas Baksys
Igor Babuschkin
Ilya Sutskever
AIMat
79
115
0
03 Feb 2022
Stable ResNet
Stable ResNet
Soufiane Hayou
Eugenio Clerico
Bo He
George Deligiannidis
Arnaud Doucet
Judith Rousseau
ODL
SSeg
46
51
0
24 Oct 2020
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
220
348
0
14 Jun 2018
1