Implicit Regularization of Gradient Flow on One-Layer Softmax Attention

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention

13 March 2024

Siyu Chen

Tianhao Wang

Harrison H. Zhou

Papers citing "Implicit Regularization of Gradient Flow on One-Layer Softmax Attention"

12 / 12 papers shown

Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity Ruifeng Ren Yong Liu 39 0 0 26 Apr 2025
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias? Tom Jacobs Chao Zhou R. Burkholz OffRL AI4CE 23 0 0 17 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning Yingcong Li Davoud Ataee Tarzanagh A. S. Rawat Maryam Fazel Samet Oymak 23 0 0 06 Apr 2025
Training Dynamics of In-Context Learning in Linear Attention Yedi Zhang Aaditya K. Singh Peter E. Latham Andrew Saxe MLT 59 1 0 28 Jan 2025
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems Bingcong Li Liang Zhang Niao He 36 3 0 18 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency Kaiyue Wen Huaqing Zhang Hongzhou Lin Jingzhao Zhang MoE LRM 58 2 0 07 Oct 2024
Mask in the Mirror: Implicit Sparsification Tom Jacobs R. Burkholz 37 3 0 19 Aug 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers Yibo Jiang Goutham Rajendran Pradeep Ravikumar Bryon Aragam CLL KELM 29 6 0 26 Jun 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 24 13 0 08 Feb 2024
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 107 61 0 07 Mar 2023
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 83 98 0 13 Oct 2021
Transformers in Vision: A Survey Salman Khan Muzammal Naseer Munawar Hayat Syed Waqas Zamir F. Khan M. Shah ViT 225 2,404 0 04 Jan 2021