The emergence of clusters in self-attention dynamics

9 May 2023

Papers citing "The emergence of clusters in self-attention dynamics"

35 / 35 papers shown

Title
A Sparse Bayesian Learning Algorithm for Estimation of Interaction Kernels in Motsch-Tadmor Model Jinchao Feng Sui Tang 14 0 0 11 May 2025
Quantum Doubly Stochastic Transformers Jannis Born Filip Skogh Kahn Rhrissorrakrai Filippo Utro Nico Wagner Aleksandros Sobczyk 27 0 0 22 Apr 2025
Quantitative Clustering in Mean-Field Transformer Models Shi Chen Zhengjiang Lin Yury Polyanskiy Philippe Rigollet 26 0 0 20 Apr 2025
Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation Zhuo-Yang Song Zeyu Li Qing-Hong Cao Ming-xing Luo Hua Xing Zhu 26 0 0 28 Mar 2025
Lines of Thought in Large Language Models Raphael Sarfati Toni J. B. Liu Nicolas Boullé Christopher Earls LRM VLM LM&Ro 58 1 0 17 Feb 2025
Artificial Kuramoto Oscillatory Neurons Takeru Miyato Sindy Lowe Andreas Geiger Max Welling AI4CE 65 6 0 17 Feb 2025
Hyperspherical Energy Transformer with Recurrent Depth Yunzhe Hu Difan Zou Dong Xu 34 0 0 17 Feb 2025
Solving Empirical Bayes via Transformers Anzo Teh Mark Jabbour Yury Polyanskiy 83 0 0 17 Feb 2025
Exact Sequence Classification with Hardmax Transformers Albert Alcalde Giovanni Fantuzzi Enrique Zuazua 65 1 0 04 Feb 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization Kelvin Kan Xingjian Li Stanley Osher 89 2 0 30 Jan 2025
The Geometry of Tokens in Internal Representations of Large Language Models Karthik Viswanathan Yuri Gardinazzi Giada Panerai Alberto Cazzaniga Matteo Biagetti AIFin 85 4 0 17 Jan 2025
The Asymptotic Behavior of Attention in Transformers Álvaro Rodríguez Abella João Pedro Silvestre Paulo Tabuada 61 3 0 03 Dec 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models Yunzhe Hu Difan Zou Dong Xu 61 1 0 26 Nov 2024
Clustering in Causal Attention Masking Nikita Karagodin Yury Polyanskiy Philippe Rigollet 52 5 0 07 Nov 2024
Emergence of meta-stable clustering in mean-field transformer models Giuseppe Bruno Federico Pasqualotto Andrea Agazzi 39 6 0 30 Oct 2024
Provable optimal transport with transformers: The essence of depth and prompt engineering Hadi Daneshmand OT 18 0 0 25 Oct 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric Baiyuan Chen MLT 18 0 0 23 Oct 2024
Demystifying the Token Dynamics of Deep Selective State Space Models Thieu N. Vo Tung D. Pham Xin T. Tong Tan Minh Nguyen Mamba 44 0 0 04 Oct 2024
Towards Understanding the Universality of Transformers for Next-Token Prediction Michael E. Sander Gabriel Peyré CML 29 0 0 03 Oct 2024
Transformers are Universal In-context Learners Takashi Furuya Maarten V. de Hoop Gabriel Peyré 24 6 0 02 Aug 2024
SINDER: Repairing the Singular Defects of DINOv2 Haoqian Wang Tong Zhang Mathieu Salzmann 21 1 0 23 Jul 2024
A Survey on LoRA of Large Language Models Yuren Mao Yuhang Ge Yijiang Fan Wenyi Xu Yu Mi Zhonghao Hu Yunjun Gao ALM 52 22 0 08 Jul 2024
Clustering in pure-attention hardmax transformers and its role in sentiment analysis Albert Alcalde Giovanni Fantuzzi Enrique Zuazua 14 3 0 26 Jun 2024
Elliptical Attention Stefan K. Nielsen Laziz U. Abdullaev R. Teo Tan M. Nguyen 14 3 0 19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis R. Teo Tan M. Nguyen 38 4 0 19 Jun 2024
Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows Medha Agarwal Zaïd Harchaoui Garrett Mulcahy Soumik Pal 20 0 0 16 Jun 2024
Continuum Attention for Neural Operators Edoardo Calvello Nikola B. Kovachki Matthew E. Levine Andrew M. Stuart 19 9 0 10 Jun 2024
On the Role of Attention Masks and LayerNorm in Transformers Xinyi Wu A. Ajorlou Yifei Wang Stefanie Jegelka Ali Jadbabaie 27 1 0 29 May 2024
Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to Turbulence Michael Chertkov AI4CE 25 2 0 26 Mar 2024
Geometric Dynamics of Signal Propagation Predict Trainability of Transformers Aditya Cowsik Tamra M. Nebabu Xiao-Liang Qi Surya Ganguli 18 9 0 05 Mar 2024
The Impact of LoRA on the Emergence of Clusters in Transformers Hugo Koubbi Matthieu Boussard Louis Hernandez 14 1 0 23 Feb 2024
Bridging Associative Memory and Probabilistic Modeling Rylan Schaeffer Nika Zahedi Mikail Khona Dhruv Pai Sang T. Truong ... Sarthak Chandra Andres Carranza Ila Rani Fiete Andrey Gromov Oluwasanmi Koyejo DiffM 40 2 0 15 Feb 2024
How Smooth Is Attention? Valérie Castin Pierre Ablin Gabriel Peyré AAML 24 9 0 22 Dec 2023
A mathematical perspective on Transformers Borjan Geshkovski Cyril Letrouit Yury Polyanskiy Philippe Rigollet EDL AI4CE 34 25 0 17 Dec 2023
Implicit regularization of deep residual networks towards neural ODEs P. Marion Yu-Han Wu Michael E. Sander Gérard Biau 17 14 0 03 Sep 2023