ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.05465
  4. Cited By
The emergence of clusters in self-attention dynamics

The emergence of clusters in self-attention dynamics

9 May 2023
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
ArXivPDFHTML

Papers citing "The emergence of clusters in self-attention dynamics"

35 / 35 papers shown
Title
A Sparse Bayesian Learning Algorithm for Estimation of Interaction Kernels in Motsch-Tadmor Model
A Sparse Bayesian Learning Algorithm for Estimation of Interaction Kernels in Motsch-Tadmor Model
Jinchao Feng
Sui Tang
14
0
0
11 May 2025
Quantum Doubly Stochastic Transformers
Quantum Doubly Stochastic Transformers
Jannis Born
Filip Skogh
Kahn Rhrissorrakrai
Filippo Utro
Nico Wagner
Aleksandros Sobczyk
27
0
0
22 Apr 2025
Quantitative Clustering in Mean-Field Transformer Models
Quantitative Clustering in Mean-Field Transformer Models
Shi Chen
Zhengjiang Lin
Yury Polyanskiy
Philippe Rigollet
26
0
0
20 Apr 2025
Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation
Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation
Zhuo-Yang Song
Zeyu Li
Qing-Hong Cao
Ming-xing Luo
Hua Xing Zhu
26
0
0
28 Mar 2025
Lines of Thought in Large Language Models
Lines of Thought in Large Language Models
Raphael Sarfati
Toni J. B. Liu
Nicolas Boullé
Christopher Earls
LRM
VLM
LM&Ro
58
1
0
17 Feb 2025
Artificial Kuramoto Oscillatory Neurons
Artificial Kuramoto Oscillatory Neurons
Takeru Miyato
Sindy Lowe
Andreas Geiger
Max Welling
AI4CE
65
6
0
17 Feb 2025
Hyperspherical Energy Transformer with Recurrent Depth
Yunzhe Hu
Difan Zou
Dong Xu
34
0
0
17 Feb 2025
Solving Empirical Bayes via Transformers
Solving Empirical Bayes via Transformers
Anzo Teh
Mark Jabbour
Yury Polyanskiy
83
0
0
17 Feb 2025
Exact Sequence Classification with Hardmax Transformers
Exact Sequence Classification with Hardmax Transformers
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
65
1
0
04 Feb 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
89
2
0
30 Jan 2025
The Geometry of Tokens in Internal Representations of Large Language Models
The Geometry of Tokens in Internal Representations of Large Language Models
Karthik Viswanathan
Yuri Gardinazzi
Giada Panerai
Alberto Cazzaniga
Matteo Biagetti
AIFin
85
4
0
17 Jan 2025
The Asymptotic Behavior of Attention in Transformers
The Asymptotic Behavior of Attention in Transformers
Álvaro Rodríguez Abella
João Pedro Silvestre
Paulo Tabuada
61
3
0
03 Dec 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like
  Models
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu
Difan Zou
Dong Xu
61
1
0
26 Nov 2024
Clustering in Causal Attention Masking
Clustering in Causal Attention Masking
Nikita Karagodin
Yury Polyanskiy
Philippe Rigollet
52
5
0
07 Nov 2024
Emergence of meta-stable clustering in mean-field transformer models
Emergence of meta-stable clustering in mean-field transformer models
Giuseppe Bruno
Federico Pasqualotto
Andrea Agazzi
39
6
0
30 Oct 2024
Provable optimal transport with transformers: The essence of depth and
  prompt engineering
Provable optimal transport with transformers: The essence of depth and prompt engineering
Hadi Daneshmand
OT
18
0
0
25 Oct 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and
  Convolution Models Using a Novel Metric
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric
Baiyuan Chen
MLT
18
0
0
23 Oct 2024
Demystifying the Token Dynamics of Deep Selective State Space Models
Demystifying the Token Dynamics of Deep Selective State Space Models
Thieu N. Vo
Tung D. Pham
Xin T. Tong
Tan Minh Nguyen
Mamba
44
0
0
04 Oct 2024
Towards Understanding the Universality of Transformers for Next-Token Prediction
Towards Understanding the Universality of Transformers for Next-Token Prediction
Michael E. Sander
Gabriel Peyré
CML
29
0
0
03 Oct 2024
Transformers are Universal In-context Learners
Transformers are Universal In-context Learners
Takashi Furuya
Maarten V. de Hoop
Gabriel Peyré
24
6
0
02 Aug 2024
SINDER: Repairing the Singular Defects of DINOv2
SINDER: Repairing the Singular Defects of DINOv2
Haoqian Wang
Tong Zhang
Mathieu Salzmann
21
1
0
23 Jul 2024
A Survey on LoRA of Large Language Models
A Survey on LoRA of Large Language Models
Yuren Mao
Yuhang Ge
Yijiang Fan
Wenyi Xu
Yu Mi
Zhonghao Hu
Yunjun Gao
ALM
52
22
0
08 Jul 2024
Clustering in pure-attention hardmax transformers and its role in
  sentiment analysis
Clustering in pure-attention hardmax transformers and its role in sentiment analysis
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
14
3
0
26 Jun 2024
Elliptical Attention
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
14
3
0
19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal
  Component Analysis
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
38
4
0
19 Jun 2024
Iterated Schrödinger bridge approximation to Wasserstein Gradient
  Flows
Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows
Medha Agarwal
Zaïd Harchaoui
Garrett Mulcahy
Soumik Pal
20
0
0
16 Jun 2024
Continuum Attention for Neural Operators
Continuum Attention for Neural Operators
Edoardo Calvello
Nikola B. Kovachki
Matthew E. Levine
Andrew M. Stuart
19
9
0
10 Jun 2024
On the Role of Attention Masks and LayerNorm in Transformers
On the Role of Attention Masks and LayerNorm in Transformers
Xinyi Wu
A. Ajorlou
Yifei Wang
Stefanie Jegelka
Ali Jadbabaie
27
1
0
29 May 2024
Mixing Artificial and Natural Intelligence: From Statistical Mechanics
  to AI and Back to Turbulence
Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to Turbulence
Michael Chertkov
AI4CE
25
2
0
26 Mar 2024
Geometric Dynamics of Signal Propagation Predict Trainability of
  Transformers
Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
Aditya Cowsik
Tamra M. Nebabu
Xiao-Liang Qi
Surya Ganguli
18
9
0
05 Mar 2024
The Impact of LoRA on the Emergence of Clusters in Transformers
The Impact of LoRA on the Emergence of Clusters in Transformers
Hugo Koubbi
Matthieu Boussard
Louis Hernandez
14
1
0
23 Feb 2024
Bridging Associative Memory and Probabilistic Modeling
Bridging Associative Memory and Probabilistic Modeling
Rylan Schaeffer
Nika Zahedi
Mikail Khona
Dhruv Pai
Sang T. Truong
...
Sarthak Chandra
Andres Carranza
Ila Rani Fiete
Andrey Gromov
Oluwasanmi Koyejo
DiffM
40
2
0
15 Feb 2024
How Smooth Is Attention?
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
24
9
0
22 Dec 2023
A mathematical perspective on Transformers
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
34
25
0
17 Dec 2023
Implicit regularization of deep residual networks towards neural ODEs
Implicit regularization of deep residual networks towards neural ODEs
P. Marion
Yu-Han Wu
Michael E. Sander
Gérard Biau
17
14
0
03 Sep 2023
1