Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

3 October 2024

Papers citing "Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient"

4 / 4 papers shown

Title
Modes of Sequence Models and Learning Coefficients Zhongtian Chen Daniel Murfet 87 1 0 25 Apr 2025
Studying Small Language Models with Susceptibilities Garrett Baker George Wang Jesse Hoogland Daniel Murfet AAML 75 1 0 25 Apr 2025
Emergence of Computational Structure in a Neural Network Physics Simulator Rohan Hitchcock Gary W. Delaney J. Manton Richard Scalzo Jingge Zhu 29 0 0 16 Apr 2025
Almost Bayesian: The Fractal Dynamics of Stochastic Gradient Descent Max Hennick Stijn De Baerdemacker 46 0 0 28 Mar 2025