Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.15712
Cited By
Infinite Limits of Multi-head Transformer Dynamics
24 May 2024
Blake Bordelon
Hamza Tahir Chaudhry
C. Pehlevan
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Infinite Limits of Multi-head Transformer Dynamics"
11 / 11 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
37
0
0
02 May 2025
Deep Neural Nets as Hamiltonians
Mike Winer
Boris Hanin
41
0
0
31 Mar 2025
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
39
1
0
24 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
55
1
0
28 Jan 2025
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
64
8
0
29 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
C. Pehlevan
43
2
0
06 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
C. Pehlevan
44
11
0
26 Sep 2024
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
C. Pehlevan
44
36
0
02 Feb 2024
The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks
Blake Bordelon
C. Pehlevan
36
22
0
05 Oct 2022
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
41
67
0
18 Oct 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
1